具有分层表面体积的高效3D关节人体生成
Efficient 3D Articulated Human Generation with Layered Surface Volumes
July 11, 2023
作者: Yinghao Xu, Wang Yifan, Alexander W. Bergman, Menglei Chai, Bolei Zhou, Gordon Wetzstein
cs.AI
摘要
在各种应用中,从虚拟现实到社交平台,获取高质量且多样化的3D关节式数字人类资产至关重要。生成式方法,如3D生成对抗网络(GANs),正在迅速取代繁琐的手动内容创建工具。然而,现有的3D GAN框架通常依赖于场景表示,这些表示利用模板网格或体积,前者速度快但质量有限,后者容量大但渲染速度慢,从而限制了GAN环境中的3D保真度。在这项工作中,我们引入了分层表面体积(LSVs)作为关节式数字人类的新3D对象表示。LSVs使用多个纹理网格层围绕传统模板表示人体。这些层使用快速可微分光栅化进行渲染,可以被解释为一种体积表示,将其容量分配给模板周围的有限厚度的流形。与传统的单层模板不擅长表示头发或配饰等细微的表面外细节不同,我们的表面体积自然地捕捉到这些细节。LSVs可以被关节化,并且它们在GAN环境中表现出卓越的效率,其中2D生成器学习合成用于各个层的RGBA纹理。在非结构化的单视图2D图像数据集上训练,我们的LSV-GAN生成高质量且视角一致的3D关节式数字人类,无需视角不一致的2D上采样网络。
English
Access to high-quality and diverse 3D articulated digital human assets is
crucial in various applications, ranging from virtual reality to social
platforms. Generative approaches, such as 3D generative adversarial networks
(GANs), are rapidly replacing laborious manual content creation tools. However,
existing 3D GAN frameworks typically rely on scene representations that
leverage either template meshes, which are fast but offer limited quality, or
volumes, which offer high capacity but are slow to render, thereby limiting the
3D fidelity in GAN settings. In this work, we introduce layered surface volumes
(LSVs) as a new 3D object representation for articulated digital humans. LSVs
represent a human body using multiple textured mesh layers around a
conventional template. These layers are rendered using alpha compositing with
fast differentiable rasterization, and they can be interpreted as a volumetric
representation that allocates its capacity to a manifold of finite thickness
around the template. Unlike conventional single-layer templates that struggle
with representing fine off-surface details like hair or accessories, our
surface volumes naturally capture such details. LSVs can be articulated, and
they exhibit exceptional efficiency in GAN settings, where a 2D generator
learns to synthesize the RGBA textures for the individual layers. Trained on
unstructured, single-view 2D image datasets, our LSV-GAN generates high-quality
and view-consistent 3D articulated digital humans without the need for
view-inconsistent 2D upsampling networks.