HeadGAP: 一般化可能なガウシアンプライアを用いたFew-shot 3Dヘッドアバター

要旨

本論文では、高忠実度かつアニメーション可能なロバスト性を備えた、少数の実世界データから汎化可能な新しい3Dヘッドアバター作成手法を提案する。この問題の制約が少ない性質を考慮し、事前知識の組み込みが不可欠である。そこで、事前学習フェーズとアバター作成フェーズから構成されるフレームワークを提案する。事前学習フェーズでは、大規模なマルチビューダイナミックデータセットから導出された3Dヘッド事前情報を活用し、アバター作成フェーズではこれらの事前情報を少数ショットのパーソナライゼーションに適用する。我々の手法は、パートベースのダイナミックモデリングを備えたガウススプラッティングベースのオートデコーダネットワークを利用することで、これらの事前情報を効果的に捉える。個々のアイデンティティに対して、パーソナライズされた潜在コードを用いたアイデンティティ共有エンコーディングを採用し、ガウスプリミティブの属性を学習する。アバター作成フェーズでは、インバージョンとファインチューニング戦略を活用することで、高速なヘッドアバターパーソナライゼーションを実現する。大規模な実験により、我々のモデルがヘッド事前情報を効果的に活用し、少数ショットのパーソナライゼーションに成功し、フォトリアルなレンダリング品質、マルチビュー一貫性、安定したアニメーションを達成することが実証された。

English

In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.