4DEquine: 単眼映像からの4次元馬再構築における形状と外観の分離

要旨

単眼映像からの馬科動物（例：ウマ）の4次元再構成は、動物福祉の観点から重要である。従来の主流な4D動物再構成手法は、映像全体にわたる動きと外観の共同最適化を必要とし、時間がかかる上に不完全な観測に対して敏感である。本研究では、4D再構成問題を「動的モーション再構成」と「静的外観再構成」の二つのサブ問題に分離する、4DEquineという新しいフレームワークを提案する。モーションについては、時空間トランスフォーマーと事後最適化段階を組み合わせた簡潔かつ効果的な手法を導入し、映像から滑らかで画素整合性のあるポーズ及び形状シーケンスを回帰する。外観については、単一画像からでも高精細でアニメーション可能な3D Gaussianアバターを再構成する新しい順伝播型ネットワークを設計する。学習を支援するため、高品質な表面モーションと多様なカメラ軌道を特徴とする大規模合成モーションデータセットVarenPoser、およびマルチビュー拡散により生成された写実的なマルチビュー画像から成る合成外観データセットVarenTexを構築した。合成データセットのみで学習したにもかかわらず、4DEquineは実世界のAPT36KおよびAiMデータセットにおいて state-of-the-art の性能を達成し、幾何形状と外観の両方の再構成分野における4DEquineと新データセットの優位性を実証した。詳細なアブレーション研究は、モーション及び外観再構成ネットワークの有効性を検証している。プロジェクトページ: https://luoxue-star.github.io/4DEquine_Project_Page/。

English

4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.

4DEquine: 単眼映像からの4次元馬再構築における形状と外観の分離

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

要旨

Support