OmniWorld:一個用於四維世界建模的多領域與多模態數據集
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
September 15, 2025
作者: Yang Zhou, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Haoyu Guo, Zizun Li, Kaijing Ma, Xinyue Li, Yating Wang, Haoyi Zhu, Mingyu Liu, Dingning Liu, Jiange Yang, Zhoujie Fu, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Kaipeng Zhang, Tong He
cs.AI
摘要
四維世界建模領域——旨在同時捕捉空間幾何與時間動態——近年來取得了顯著進展,這主要得益於大規模生成模型和多模態學習的進步。然而,真正通用的四維世界模型的發展仍受到高質量數據可用性的根本性限制。現有的數據集和基準測試往往缺乏支持關鍵任務所需的動態複雜性、多領域多樣性以及時空註釋,這些任務包括四維幾何重建、未來預測和相機控制視頻生成。為填補這一空白,我們推出了OmniWorld,這是一個專為四維世界建模設計的大規模、多領域、多模態數據集。OmniWorld由新收集的OmniWorld-Game數據集和幾個精選的公共數據集組成,涵蓋多個領域。與現有的合成數據集相比,OmniWorld-Game提供了更豐富的模態覆蓋、更大的規模以及更真實的動態交互。基於此數據集,我們建立了一個具有挑戰性的基準測試,揭示了當前最先進(SOTA)方法在建模複雜四維環境時的局限性。此外,在OmniWorld上微調現有的SOTA方法,在四維重建和視頻生成任務上均取得了顯著的性能提升,強有力地驗證了OmniWorld作為訓練和評估資源的強大能力。我們期待OmniWorld能成為加速通用四維世界模型開發的催化劑,最終推動機器對物理世界的全面理解。
English
The field of 4D world modeling - aiming to jointly capture spatial geometry
and temporal dynamics - has witnessed remarkable progress in recent years,
driven by advances in large-scale generative models and multimodal learning.
However, the development of truly general 4D world models remains fundamentally
constrained by the availability of high-quality data. Existing datasets and
benchmarks often lack the dynamic complexity, multi-domain diversity, and
spatial-temporal annotations required to support key tasks such as 4D geometric
reconstruction, future prediction, and camera-control video generation. To
address this gap, we introduce OmniWorld, a large-scale, multi-domain,
multi-modal dataset specifically designed for 4D world modeling. OmniWorld
consists of a newly collected OmniWorld-Game dataset and several curated public
datasets spanning diverse domains. Compared with existing synthetic datasets,
OmniWorld-Game provides richer modality coverage, larger scale, and more
realistic dynamic interactions. Based on this dataset, we establish a
challenging benchmark that exposes the limitations of current state-of-the-art
(SOTA) approaches in modeling complex 4D environments. Moreover, fine-tuning
existing SOTA methods on OmniWorld leads to significant performance gains
across 4D reconstruction and video generation tasks, strongly validating
OmniWorld as a powerful resource for training and evaluation. We envision
OmniWorld as a catalyst for accelerating the development of general-purpose 4D
world models, ultimately advancing machines' holistic understanding of the
physical world.