ChatPaper.aiChatPaper

使用混合3D表示學習解耦合的虛擬人物形象

Learning Disentangled Avatars with Hybrid 3D Representations

September 12, 2023
作者: Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, Michael J. Black
cs.AI

摘要

為了學習具有動畫效果和逼真人類頭像,已經做出了巨大的努力。為了全面建模和捕捉整個人類(例如身體、服裝、臉部和頭髮),人們積極研究明確和隱式的3D表示。但是,由於人類頭像的不同部分具有不同的建模需求,因此這兩種表示都不是最佳選擇。例如,網格通常不適合用於建模服裝和頭髮。受此啟發,我們提出了Disentangled Avatars(DELTA),該模型使用混合的明確-隱式3D表示來建模人類。DELTA將單眼RGB視頻作為輸入,並生成具有獨立身體和服裝/頭髮層的人類頭像。具體來說,我們展示了DELTA的兩個重要應用。第一個應用是考慮將人體和服裝分離,第二個應用是分離面部和頭髮。為此,DELTA使用明確的基於網格的參數化3D模型來表示身體或面部,並使用隱式神經輻射場來表示服裝或頭髮。為實現此目的,我們設計了一個端到端可微分的渲染器,將網格集成到體積渲染中,使DELTA能夠直接從單眼視頻中學習,而無需任何3D監督。最後,我們展示了如何輕鬆結合這兩個應用程序來建模全身頭像,使頭髮、面部、身體和服裝可以完全分離但共同渲染。這種分離使頭髮和服裝可以轉移到任意身體形狀。我們通過展示DELTA在分離重建、虛擬試穿服裝和髮型轉移方面的優異表現來實證了DELTA分離的有效性。為了促進未來研究,我們還釋放了一個用於研究混合人類頭像建模的開源管道。
English
Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.
PDF60December 15, 2024