OmniWorld:面向四维世界建模的多领域多模态数据集
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
September 15, 2025
作者: Yang Zhou, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Haoyu Guo, Zizun Li, Kaijing Ma, Xinyue Li, Yating Wang, Haoyi Zhu, Mingyu Liu, Dingning Liu, Jiange Yang, Zhoujie Fu, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Kaipeng Zhang, Tong He
cs.AI
摘要
近年来,在大型生成模型和多模态学习技术进步的推动下,旨在同时捕捉空间几何与时间动态的四维世界建模领域取得了显著进展。然而,真正通用的四维世界模型的发展从根本上受到高质量数据可用性的限制。现有数据集和基准测试往往缺乏支持关键任务所需的动态复杂性、多领域多样性以及时空标注,这些任务包括四维几何重建、未来预测和相机控制视频生成等。为填补这一空白,我们推出了OmniWorld,这是一个专为四维世界建模设计的大规模、多领域、多模态数据集。OmniWorld包含新收集的OmniWorld-Game数据集及多个精选的跨领域公共数据集。与现有合成数据集相比,OmniWorld-Game提供了更丰富的模态覆盖、更大的规模以及更真实的动态交互。基于此数据集,我们建立了一个具有挑战性的基准测试,揭示了当前最先进(SOTA)方法在建模复杂四维环境中的局限性。此外,在OmniWorld上微调现有的SOTA方法,在四维重建和视频生成任务上均带来了显著的性能提升,有力验证了OmniWorld作为训练和评估的强大资源的价值。我们预见OmniWorld将成为加速通用四维世界模型开发的催化剂,最终推动机器对物理世界的整体理解向前迈进。
English
The field of 4D world modeling - aiming to jointly capture spatial geometry
and temporal dynamics - has witnessed remarkable progress in recent years,
driven by advances in large-scale generative models and multimodal learning.
However, the development of truly general 4D world models remains fundamentally
constrained by the availability of high-quality data. Existing datasets and
benchmarks often lack the dynamic complexity, multi-domain diversity, and
spatial-temporal annotations required to support key tasks such as 4D geometric
reconstruction, future prediction, and camera-control video generation. To
address this gap, we introduce OmniWorld, a large-scale, multi-domain,
multi-modal dataset specifically designed for 4D world modeling. OmniWorld
consists of a newly collected OmniWorld-Game dataset and several curated public
datasets spanning diverse domains. Compared with existing synthetic datasets,
OmniWorld-Game provides richer modality coverage, larger scale, and more
realistic dynamic interactions. Based on this dataset, we establish a
challenging benchmark that exposes the limitations of current state-of-the-art
(SOTA) approaches in modeling complex 4D environments. Moreover, fine-tuning
existing SOTA methods on OmniWorld leads to significant performance gains
across 4D reconstruction and video generation tasks, strongly validating
OmniWorld as a powerful resource for training and evaluation. We envision
OmniWorld as a catalyst for accelerating the development of general-purpose 4D
world models, ultimately advancing machines' holistic understanding of the
physical world.