4DNeX:簡化前饋式四維生成建模
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
August 18, 2025
作者: Zhaoxi Chen, Tianqi Liu, Long Zhuo, Jiawei Ren, Zeng Tao, He Zhu, Fangzhou Hong, Liang Pan, Ziwei Liu
cs.AI
摘要
我們提出了4DNeX,這是首個從單一圖像生成4D(即動態3D)場景表徵的前饋框架。與現有依賴計算密集型優化或需要多幀視頻輸入的方法不同,4DNeX通過微調預訓練的視頻擴散模型,實現了高效的端到端圖像到4D生成。具體而言,1)為緩解4D數據的稀缺性,我們構建了4DNeX-10M,這是一個利用先進重建方法生成高質量4D註釋的大規模數據集。2)我們引入了一種統一的6D視頻表徵,聯合建模RGB和XYZ序列,促進外觀和幾何的結構化學習。3)我們提出了一系列簡單而有效的適應策略,將預訓練的視頻擴散模型重新用於4D建模。4DNeX生成的高質量動態點雲支持新視角視頻合成。大量實驗表明,4DNeX在效率和泛化能力上優於現有的4D生成方法,為圖像到4D建模提供了一個可擴展的解決方案,並為模擬動態場景演化的生成式4D世界模型奠定了基礎。
English
We present 4DNeX, the first feed-forward framework for generating 4D (i.e.,
dynamic 3D) scene representations from a single image. In contrast to existing
methods that rely on computationally intensive optimization or require
multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D
generation by fine-tuning a pretrained video diffusion model. Specifically, 1)
to alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale
dataset with high-quality 4D annotations generated using advanced
reconstruction approaches. 2) we introduce a unified 6D video representation
that jointly models RGB and XYZ sequences, facilitating structured learning of
both appearance and geometry. 3) we propose a set of simple yet effective
adaptation strategies to repurpose pretrained video diffusion models for 4D
modeling. 4DNeX produces high-quality dynamic point clouds that enable
novel-view video synthesis. Extensive experiments demonstrate that 4DNeX
outperforms existing 4D generation methods in efficiency and generalizability,
offering a scalable solution for image-to-4D modeling and laying the foundation
for generative 4D world models that simulate dynamic scene evolution.