4DNeX:简化前馈式四维生成建模
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
August 18, 2025
作者: Zhaoxi Chen, Tianqi Liu, Long Zhuo, Jiawei Ren, Zeng Tao, He Zhu, Fangzhou Hong, Liang Pan, Ziwei Liu
cs.AI
摘要
我们提出了4DNeX,这是首个从单张图像生成4D(即动态3D)场景表征的前馈框架。与现有方法依赖计算密集型的优化或需要多帧视频输入不同,4DNeX通过微调预训练的视频扩散模型,实现了高效的端到端图像到4D生成。具体而言:1)为缓解4D数据稀缺问题,我们构建了4DNeX-10M,这是一个利用先进重建方法生成高质量4D标注的大规模数据集;2)我们引入了一种统一的6D视频表征,联合建模RGB和XYZ序列,促进外观与几何的结构化学习;3)我们提出了一系列简单而有效的适应策略,将预训练的视频扩散模型重新用于4D建模。4DNeX生成的高质量动态点云支持新视角视频合成。大量实验表明,4DNeX在效率和泛化能力上均优于现有的4D生成方法,为图像到4D建模提供了可扩展的解决方案,并为模拟动态场景演化的生成式4D世界模型奠定了基础。
English
We present 4DNeX, the first feed-forward framework for generating 4D (i.e.,
dynamic 3D) scene representations from a single image. In contrast to existing
methods that rely on computationally intensive optimization or require
multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D
generation by fine-tuning a pretrained video diffusion model. Specifically, 1)
to alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale
dataset with high-quality 4D annotations generated using advanced
reconstruction approaches. 2) we introduce a unified 6D video representation
that jointly models RGB and XYZ sequences, facilitating structured learning of
both appearance and geometry. 3) we propose a set of simple yet effective
adaptation strategies to repurpose pretrained video diffusion models for 4D
modeling. 4DNeX produces high-quality dynamic point clouds that enable
novel-view video synthesis. Extensive experiments demonstrate that 4DNeX
outperforms existing 4D generation methods in efficiency and generalizability,
offering a scalable solution for image-to-4D modeling and laying the foundation
for generative 4D world models that simulate dynamic scene evolution.