Omni-WorldBench:邁向世界模型的全面互動導向評估框架
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models
March 23, 2026
作者: Meiqi Wu, Zhixin Cai, Fufangchen Zhao, Xiaokun Feng, Rujing Dang, Bingze Song, Ruitian Tian, Jiashu Zhu, Jiachen Lei, Hao Dou, Jing Tang, Lei Sun, Jiahong Wu, Xiangxiang Chu, Zeming Liu, Kaiqi Huang
cs.AI
摘要
基於影片的世界模型主要沿著兩大主流範式發展:影片生成與三維重建。然而現有評估基準要么側重於生成模型的視覺保真度與文本-影片對齊能力,要么依賴於根本上忽略時序動態的靜態三維重建指標。我們認為世界建模的未來在於四維生成,這種範式能同步建模空間結構與時序演變。在此範式中,核心能力是互動響應:即準確反映互動行為如何驅動時空狀態轉變的能力。但目前尚無系統性評估這一關鍵維度的基準框架。為填補此空白,我們提出Omni-WorldBench——專為評估四維場景下世界模型互動響應能力而設計的綜合基準。該基準包含兩大核心組件:Omni-WorldSuite(涵蓋多層級互動與多類型場景的系統化提示集)和Omni-Metrics(基於智能體的評估框架,通過量化互動行為對最終結果與中間狀態演變軌跡的因果影響來評估世界建模能力)。我們對18個代表性世界模型進行跨範式廣泛評估,分析揭示了當前模型在互動響應方面的關鍵侷限性,為未來研究提供可行方向。Omni-WorldBench將公開釋出以促進互動式四維世界建模的發展。
English
Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench comprises two key components: Omni--WorldSuite, a systematic prompt suite spanning diverse interaction levels and scene types; and Omni--Metrics, an agent-based evaluation framework that quantifies world modeling capabilities by measuring the causal impact of interaction actions on both final outcomes and intermediate state evolution trajectories. We conduct extensive evaluations of 18 representative world models across multiple paradigms. Our analysis reveals critical limitations of current world models in interactive response, providing actionable insights for future research. Omni-WorldBench will be publicly released to foster progress in interactive 4D world modeling.