4Diffusion：多視角視頻擴散模型用於4D生成

摘要

目前的4D生成方法借助先進的擴散生成模型已經取得顯著的效能。然而，這些方法缺乏多視角時空建模，並在整合來自多個擴散模型的不同先前知識方面遇到挑戰，導致時間外觀不一致和閃爍問題。在本文中，我們提出了一種新穎的4D生成流程，名為4Diffusion，旨在從單眼視頻生成時空一致的4D內容。我們首先設計了一個針對多視角視頻生成量身定制的統一擴散模型，通過將可學習的運動模塊納入凍結的3D感知擴散模型中，來捕捉多視角時空相關性。在經過精心策劃的數據集訓練後，我們的擴散模型獲得了合理的時間一致性，並固有地保留了3D感知擴散模型的泛化能力和空間一致性。隨後，我們提出了基於我們的多視角視頻擴散模型的4D感知分數蒸餾抽樣損失，以優化由動態NeRF參數化的4D表示。這旨在消除來自多個擴散模型的差異，從而實現生成時空一致的4D內容。此外，我們設計了一個錨損失來增強外觀細節並促進動態NeRF的學習。廣泛的定性和定量實驗表明，我們的方法相對於先前的方法取得了優異的性能。

English

Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior knowledge from multiple diffusion models, resulting in inconsistent temporal appearance and flickers. In this paper, we propose a novel 4D generation pipeline, namely 4Diffusion aimed at generating spatial-temporally consistent 4D content from a monocular video. We first design a unified diffusion model tailored for multi-view video generation by incorporating a learnable motion module into a frozen 3D-aware diffusion model to capture multi-view spatial-temporal correlations. After training on a curated dataset, our diffusion model acquires reasonable temporal consistency and inherently preserves the generalizability and spatial consistency of the 3D-aware diffusion model. Subsequently, we propose 4D-aware Score Distillation Sampling loss, which is based on our multi-view video diffusion model, to optimize 4D representation parameterized by dynamic NeRF. This aims to eliminate discrepancies arising from multiple diffusion models, allowing for generating spatial-temporally consistent 4D content. Moreover, we devise an anchor loss to enhance the appearance details and facilitate the learning of dynamic NeRF. Extensive qualitative and quantitative experiments demonstrate that our method achieves superior performance compared to previous methods.

4Diffusion：多視角視頻擴散模型用於4D生成

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

摘要

Support