フォトリアルなアバターの高速登録によるVR顔面アニメーション

要旨

仮想現実（VR）は、他のメディアよりも没入感の高い社会的相互作用を実現する可能性を秘めています。その鍵となるのは、VRヘッドセットを装着しながら、自分の外見を忠実に再現したフォトリアルなアバターを正確にアニメーション化する能力です。オフライン環境では、個人に特化したアバターをヘッドセット搭載カメラ（HMC）画像に高精度で登録することが可能ですが、汎用的なリアルタイムモデルの性能は大幅に低下します。また、斜めからのカメラ視点やモダリティの違いにより、オンライン登録も困難です。本研究ではまず、アバターとヘッドセットカメラ画像間のドメインギャップが主要な困難の源であることを示し、トランスフォーマーベースのアーキテクチャがドメイン整合データでは高い精度を達成するものの、ドメインギャップが再導入されると性能が低下することを明らかにします。この知見を基に、問題を2つの部分に分離するシステム設計を開発しました：1）ドメイン内の入力を処理する反復的リファインメントモジュール、および2）現在の表情と頭部姿勢の推定に基づいて条件付けられる汎用アバター誘導型画像間スタイル変換モジュールです。これら2つのモジュールは相互に補強し合い、グラウンドトゥルースに近い例が提示されると画像スタイル変換が容易になり、ドメインギャップの除去が改善されると登録精度が向上します。本システムは、高品質な結果を効率的に生成し、個人化されたラベルを生成するための高コストなオフライン登録の必要性を排除します。市販のヘッドセットを用いた広範な実験を通じて、本アプローチの精度と効率を検証し、直接回帰法やオフライン登録と比較して大幅な改善を示しました。

English

Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime models are significantly degraded. Online registration is also challenging due to oblique camera views and differences in modality. In this work, we first show that the domain gap between the avatar and headset-camera images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we develop a system design that decouples the problem into two parts: 1) an iterative refinement module that takes in-domain inputs, and 2) a generic avatar-guided image-to-image style transfer module that is conditioned on current estimation of expression and head pose. These two modules reinforce each other, as image style transfer becomes easier when close-to-ground-truth examples are shown, and better domain-gap removal helps registration. Our system produces high-quality results efficiently, obviating the need for costly offline registration to generate personalized labels. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over direct regression methods as well as offline registration.

フォトリアルなアバターの高速登録によるVR顔面アニメーション

Fast Registration of Photorealistic Avatars for VR Facial Animation

要旨

Support