具有分層表面體積的高效3D可動人類生成
Efficient 3D Articulated Human Generation with Layered Surface Volumes
July 11, 2023
作者: Yinghao Xu, Wang Yifan, Alexander W. Bergman, Menglei Chai, Bolei Zhou, Gordon Wetzstein
cs.AI
摘要
在各種應用中,從虛擬實境到社交平台,獲取高質量和多樣化的3D人體數位資產至關重要。生成式方法,如3D生成對抗網絡(GANs),正迅速取代費時的手動內容創建工具。然而,現有的3D GAN框架通常依賴場景表示,這些表示利用模板網格(速度快但質量有限)或體積(容量大但渲染緩慢),從而限制了GAN環境中的3D保真度。在這項工作中,我們引入了分層表面體積(LSVs)作為新的3D物體表示,用於表達關節式數位人體。LSVs使用多個帶有紋理的網格層來表示人體,環繞著一個傳統模板。這些層使用快速可微分的光柵化進行渲染,可以被解釋為一種體積表示,將其容量分配給模板周圍的有限厚度流形。與傳統的單層模板不同,後者難以表示頭髮或配飾等細微的非表面細節,我們的表面體積自然地捕捉到這些細節。LSVs可以被關節化,並且在GAN環境中表現出卓越的效率,其中2D生成器學習合成個別層的RGBA紋理。通過在非結構化的單視圖2D圖像數據集上進行訓練,我們的LSV-GAN生成高質量且視角一致的3D關節式數位人體,無需視角不一致的2D上採樣網絡。
English
Access to high-quality and diverse 3D articulated digital human assets is
crucial in various applications, ranging from virtual reality to social
platforms. Generative approaches, such as 3D generative adversarial networks
(GANs), are rapidly replacing laborious manual content creation tools. However,
existing 3D GAN frameworks typically rely on scene representations that
leverage either template meshes, which are fast but offer limited quality, or
volumes, which offer high capacity but are slow to render, thereby limiting the
3D fidelity in GAN settings. In this work, we introduce layered surface volumes
(LSVs) as a new 3D object representation for articulated digital humans. LSVs
represent a human body using multiple textured mesh layers around a
conventional template. These layers are rendered using alpha compositing with
fast differentiable rasterization, and they can be interpreted as a volumetric
representation that allocates its capacity to a manifold of finite thickness
around the template. Unlike conventional single-layer templates that struggle
with representing fine off-surface details like hair or accessories, our
surface volumes naturally capture such details. LSVs can be articulated, and
they exhibit exceptional efficiency in GAN settings, where a 2D generator
learns to synthesize the RGBA textures for the individual layers. Trained on
unstructured, single-view 2D image datasets, our LSV-GAN generates high-quality
and view-consistent 3D articulated digital humans without the need for
view-inconsistent 2D upsampling networks.