NPGA: ニューラルパラメトリックガウシアンアバター

要旨

高精細なデジタル版の人間の頭部を作成することは、仮想コンポーネントを日常生活にさらに統合するプロセスにおける重要なステップです。このようなアバターの構築は、フォトリアリズムとリアルタイムレンダリング性能に対する高い要求のため、挑戦的な研究課題です。本研究では、マルチビュービデオ記録から高精細で制御可能なアバターを作成するデータ駆動型アプローチとして、Neural Parametric Gaussian Avatars（NPGA）を提案します。私たちの手法は、非常に効率的なレンダリングとポイントクラウドのトポロジー的柔軟性を継承するために、3D Gaussian Splattingを基盤としています。従来の研究とは対照的に、私たちはアバターのダイナミクスを、メッシュベースの3DMMではなく、ニューラルパラメトリックヘッドモデル（NPHM）の豊かな表情空間に基づいて条件付けます。この目的のために、基盤となるNPHMの逆変形場を、ラスタライゼーションベースのレンダリングと互換性のある順変形に蒸留します。残りの微細な表情依存の詳細は、マルチビュービデオから学習されます。アバターの表現能力を向上させるために、動的挙動を制御するプリミティブごとの潜在特徴を使用して、正準ガウスポイントクラウドを拡張します。この増加した動的表現力を正則化するために、潜在特徴と予測されたダイナミクスに対するラプラシアン項を提案します。私たちの手法を公開データセットNeRSembleで評価し、NPGAが自己再現タスクにおいて従来の最先端アバターを2.6 PSNRで大幅に上回ることを示します。さらに、実世界の単眼ビデオからの正確なアニメーション能力を実証します。

English

The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.

NPGA: ニューラルパラメトリックガウシアンアバター

NPGA: Neural Parametric Gaussian Avatars

要旨

Support