GALA:从单个扫描生成可动画分层资产
GALA: Generating Animatable Layered Assets from a Single Scan
January 23, 2024
作者: Taeksoo Kim, Byungjun Kim, Shunsuke Saito, Hanbyul Joo
cs.AI
摘要
我们提出了GALA,这是一个框架,它以单层穿着的3D人体网格为输入,并将其分解为完整的多层3D资产。然后可以将输出与其他资产结合,创建具有任何姿势的新颖穿着的人类化身。现有的重建方法通常将穿着的人类视为单层几何体,并忽视了具有发型、服装和配饰的人类固有的组合性,从而限制了网格在下游应用中的效用。将单层网格分解为独立层是一项具有挑战性的任务,因为它需要为严重遮挡区域合成合理的几何体和纹理。此外,即使成功分解,网格在姿势和体型方面也没有被规范化,无法与新颖身份和姿势进行连贯组合。为了解决这些挑战,我们建议利用预训练的2D扩散模型作为人类和其他资产的几何和外观先验的通用知识。我们首先使用从多视角2D分割中提取的3D表面分割来分离输入网格。然后,我们使用一种新颖的姿势引导的得分蒸馏采样(SDS)损失,在姿势和规范空间中合成不同层的缺失几何。一旦我们完成了修补高保真度的3D几何,我们还将同样的SDS损失应用于其纹理,以获得包括最初被遮挡区域在内的完整外观。通过一系列分解步骤,我们在共享的规范空间中获得了多层3D资产,这些资产在姿势和人体形状方面被规范化,从而支持轻松地将其组合到新的身份和重新动画化的新姿势。我们的实验表明,与现有解决方案相比,我们的方法在分解、规范化和组合任务中的有效性。
English
We present GALA, a framework that takes as input a single-layer clothed 3D
human mesh and decomposes it into complete multi-layered 3D assets. The outputs
can then be combined with other assets to create novel clothed human avatars
with any pose. Existing reconstruction approaches often treat clothed humans as
a single-layer of geometry and overlook the inherent compositionality of humans
with hairstyles, clothing, and accessories, thereby limiting the utility of the
meshes for downstream applications. Decomposing a single-layer mesh into
separate layers is a challenging task because it requires the synthesis of
plausible geometry and texture for the severely occluded regions. Moreover,
even with successful decomposition, meshes are not normalized in terms of poses
and body shapes, failing coherent composition with novel identities and poses.
To address these challenges, we propose to leverage the general knowledge of a
pretrained 2D diffusion model as geometry and appearance prior for humans and
other assets. We first separate the input mesh using the 3D surface
segmentation extracted from multi-view 2D segmentations. Then we synthesize the
missing geometry of different layers in both posed and canonical spaces using a
novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete
inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its
texture to obtain the complete appearance including the initially occluded
regions. Through a series of decomposition steps, we obtain multiple layers of
3D assets in a shared canonical space normalized in terms of poses and human
shapes, hence supporting effortless composition to novel identities and
reanimation with novel poses. Our experiments demonstrate the effectiveness of
our approach for decomposition, canonicalization, and composition tasks
compared to existing solutions.