ハイブリッド3D表現を用いた分離可能なアバターの学習

要旨

アニメーション可能でフォトリアルな人間のアバターを学習するために、多大な努力が払われてきました。この目的に向けて、人間全体（例えば、身体、衣服、顔、髪）の包括的なモデリングとキャプチャのために、明示的および暗黙的な3D表現の両方が精力的に研究されています。しかし、人間のアバターの異なる部分には異なるモデリング要件があるため、どちらの表現も表現効率の点で最適な選択肢ではありません。例えば、メッシュは一般的に衣服や髪のモデリングには適していません。これに動機づけられて、我々はハイブリッドな明示的-暗黙的3D表現で人間をモデル化するDisentangled Avatars（DELTA）を提案します。DELTAは単眼RGBビデオを入力として受け取り、身体と衣服/髪の層を分離した人間のアバターを生成します。具体的には、DELTAの2つの重要な応用例を示します。1つ目は、人間の身体と衣服の分離を考慮し、2つ目は、顔と髪の分離を行います。そのために、DELTAは身体や顔を明示的なメッシュベースのパラメトリック3Dモデルで表現し、衣服や髪を暗黙的なニューラルラジアンスフィールドで表現します。これを可能にするために、我々はメッシュをボリュームレンダリングに統合するエンドツーエンドの微分可能なレンダラを設計し、DELTAが3Dの監督なしに単眼ビデオから直接学習できるようにします。最後に、これらの2つの応用を簡単に組み合わせて全身アバターをモデル化し、髪、顔、身体、衣服を完全に分離しながらも共同でレンダリングできることを示します。このような分離により、任意の身体形状への髪や衣服の転移が可能になります。我々は、分離された再構成、仮想衣服の試着、ヘアスタイル転移におけるDELTAの有望な性能を示すことで、その分離の有効性を実証的に検証します。将来の研究を促進するために、ハイブリッド人間アバターモデリングの研究のためのオープンソースパイプラインも公開します。

English

Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.

ハイブリッド3D表現を用いた分離可能なアバターの学習

Learning Disentangled Avatars with Hybrid 3D Representations

要旨

Support