4Diffusion: 4D生成のためのマルチビュービデオ拡散モデル

要旨

現在の4D生成手法は、高度な拡散生成モデルの助けを借りて注目すべき効果を達成しています。しかし、これらの手法はマルチビューの時空間モデリングを欠いており、複数の拡散モデルから得られる多様な事前知識を統合する際に課題に直面し、時間的な外観の不整合やちらつきが生じています。本論文では、モノクロ動画から時空間的に一貫した4Dコンテンツを生成することを目的とした新しい4D生成パイプライン、すなわち4Diffusionを提案します。まず、凍結された3D認識拡散モデルに学習可能なモーションモジュールを組み込むことで、マルチビューの時空間相関を捉えるための統一された拡散モデルを設計します。選別されたデータセットで訓練した後、この拡散モデルは合理的な時間的一貫性を獲得し、3D認識拡散モデルの汎用性と空間的一貫性を本質的に保持します。その後、動的NeRFによってパラメータ化された4D表現を最適化するために、マルチビュー動画拡散モデルに基づく4D認識スコア蒸留サンプリング損失を提案します。これにより、複数の拡散モデルから生じる不一致を排除し、時空間的に一貫した4Dコンテンツの生成を可能にします。さらに、外観の詳細を強化し、動的NeRFの学習を促進するためのアンカー損失を考案します。広範な定性的および定量的な実験により、我々の手法が従来の手法と比較して優れた性能を達成することが実証されています。

English

Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior knowledge from multiple diffusion models, resulting in inconsistent temporal appearance and flickers. In this paper, we propose a novel 4D generation pipeline, namely 4Diffusion aimed at generating spatial-temporally consistent 4D content from a monocular video. We first design a unified diffusion model tailored for multi-view video generation by incorporating a learnable motion module into a frozen 3D-aware diffusion model to capture multi-view spatial-temporal correlations. After training on a curated dataset, our diffusion model acquires reasonable temporal consistency and inherently preserves the generalizability and spatial consistency of the 3D-aware diffusion model. Subsequently, we propose 4D-aware Score Distillation Sampling loss, which is based on our multi-view video diffusion model, to optimize 4D representation parameterized by dynamic NeRF. This aims to eliminate discrepancies arising from multiple diffusion models, allowing for generating spatial-temporally consistent 4D content. Moreover, we devise an anchor loss to enhance the appearance details and facilitate the learning of dynamic NeRF. Extensive qualitative and quantitative experiments demonstrate that our method achieves superior performance compared to previous methods.

4Diffusion: 4D生成のためのマルチビュービデオ拡散モデル

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

要旨

Support