使用混合3D表示学习解耦合人物头像
Learning Disentangled Avatars with Hybrid 3D Representations
September 12, 2023
作者: Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, Michael J. Black
cs.AI
摘要
为了学习可动和逼真的人类化身,人们做出了巨大的努力。为此,人们深入研究了显式和隐式的三维表示,以便对整个人体(例如身体、服装、面部和头发)进行全面建模和捕捉,但是由于人类化身的不同部分具有不同的建模需求,因此这两种表示都不是最佳选择。例如,网格通常不适合用于建模服装和头发。受此启发,我们提出了“分解化人类化身”(DELTA),它使用混合显式-隐式三维表示对人类进行建模。DELTA以单目RGB视频作为输入,并生成具有独立身体和服装/头发层的人类化身。具体而言,我们展示了DELTA的两个重要应用。在第一个应用中,我们考虑了人体和服装之间的解耦,而在第二个应用中,我们解耦了面部和头发。为此,DELTA使用基于网格的参数化三维模型表示身体或面部,而使用隐式神经辐射场表示服装或头发。为实现这一目标,我们设计了一个端到端可微分渲染器,将网格整合到体积渲染中,使DELTA能够直接从单目视频中学习,无需任何三维监督。最后,我们展示了如何轻松地将这两个应用结合起来,以建模全身化身,使头发、面部、身体和服装能够完全解耦但同时进行渲染。这种解耦使头发和服装能够转移到任意身体形状上。我们通过展示DELTA在解耦重建、虚拟试穿服装和发型转移等方面的出色表现来实证验证了DELTA解耦的有效性。为促进未来研究,我们还发布了一个用于研究混合人类化身建模的开源流程。
English
Tremendous efforts have been made to learn animatable and photorealistic
human avatars. Towards this end, both explicit and implicit 3D representations
are heavily studied for a holistic modeling and capture of the whole human
(e.g., body, clothing, face and hair), but neither representation is an optimal
choice in terms of representation efficacy since different parts of the human
avatar have different modeling desiderata. For example, meshes are generally
not suitable for modeling clothing and hair. Motivated by this, we present
Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit
3D representations. DELTA takes a monocular RGB video as input, and produces a
human avatar with separate body and clothing/hair layers. Specifically, we
demonstrate two important applications for DELTA. For the first one, we
consider the disentanglement of the human body and clothing and in the second,
we disentangle the face and hair. To do so, DELTA represents the body or face
with an explicit mesh-based parametric 3D model and the clothing or hair with
an implicit neural radiance field. To make this possible, we design an
end-to-end differentiable renderer that integrates meshes into volumetric
rendering, enabling DELTA to learn directly from monocular videos without any
3D supervision. Finally, we show that how these two applications can be easily
combined to model full-body avatars, such that the hair, face, body and
clothing can be fully disentangled yet jointly rendered. Such a disentanglement
enables hair and clothing transfer to arbitrary body shapes. We empirically
validate the effectiveness of DELTA's disentanglement by demonstrating its
promising performance on disentangled reconstruction, virtual clothing try-on
and hairstyle transfer. To facilitate future research, we also release an
open-sourced pipeline for the study of hybrid human avatar modeling.