UltrAvatar: 真正性ガイド付きテクスチャを備えた現実的なアニメーション可能な3Dアバター拡散モデル

要旨

近年、3Dアバター生成の分野で大きな進展が見られ、注目を集めています。これらのブレークスルーは、より現実的でアニメーション可能なアバターを生成し、仮想世界と現実世界の体験のギャップを縮めることを目指しています。既存の研究の多くは、スコア蒸留サンプリング（SDS）損失を微分可能なレンダラーとテキスト条件と組み合わせて、拡散モデルをガイドし、3Dアバターを生成しています。しかし、SDSはしばしば過度に滑らかな結果を生成し、顔の詳細が少なく、祖先サンプリングと比較して多様性に欠けることがあります。一方、他の研究では単一の画像から3Dアバターを生成しますが、不要な照明効果、視点、および劣化した画像品質の課題により、整列した完全なテクスチャを持つ3D顔メッシュを信頼性高く再構築することが困難です。本論文では、ジオメトリの忠実度を向上させ、不要な照明のない物理ベースレンダリング（PBR）テクスチャの優れた品質を備えた新しい3Dアバター生成アプローチ「UltrAvatar」を提案します。この目的のために、提案アプローチは拡散色抽出モデルと真正性ガイド付きテクスチャ拡散モデルを提示します。前者は不要な照明効果を取り除き、真の拡散色を明らかにすることで、生成されたアバターをさまざまな照明条件下でレンダリングできるようにします。後者は、PBRテクスチャを生成するための2つの勾配ベースのガイダンスに従い、3Dメッシュジオメトリに整列した多様な顔の識別特徴と詳細をより良くレンダリングします。提案手法の有効性と堅牢性を実証し、実験において最先端の手法を大きく上回る性能を示します。

English

Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However, SDS often generates oversmoothed results with few facial details, thereby lacking the diversity compared with ancestral sampling. On the other hand, other works generate 3D avatar from a single image, where the challenges of unwanted lighting effects, perspective views, and inferior image quality make them difficult to reliably reconstruct the 3D face meshes with the aligned complete textures. In this paper, we propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting. To this end, the proposed approach presents a diffuse color extraction model and an authenticity guided texture diffusion model. The former removes the unwanted lighting effects to reveal true diffuse colors so that the generated avatars can be rendered under various lighting conditions. The latter follows two gradient-based guidances for generating PBR textures to render diverse face-identity features and details better aligning with 3D mesh geometry. We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.

UltrAvatar: 真正性ガイド付きテクスチャを備えた現実的なアニメーション可能な3Dアバター拡散モデル

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

要旨

Support