Omni-WorldBench: 世界モデルの包括的インタラクション中心評価に向けて

要旨

ビデオベースの世界モデルは、主に2つの主流パラダイムとして発展してきた：ビデオ生成と3D再構成である。しかし、既存の評価ベンチマークは、生成モデルに対する視覚的忠実度とテキスト-ビデオ整合性に狭く焦点を当てるか、あるいは時間的ダイナミクスを本質的に無視する静的な3D再構成メトリクスに依存している。我々は、世界モデリングの未来は空間構造と時間的進化を統合的にモデル化する4D生成にあると主張する。このパラダイムにおいて中核となる能力は、インタラクティブな応答性、すなわち相互作用アクションが時間と空間を跨ぐ状態遷移をどのように駆動するかを忠実に反映する能力である。しかし、この重要な次元を体系的に評価する既存のベンチマークは存在しない。このギャップを埋めるため、我々は4D設定における世界モデルのインタラクティブ応答能力を評価するために特別に設計された包括的ベンチマーク、Omni-WorldBenchを提案する。Omni-WorldBenchは2つの主要コンポーネントから構成される：多様なインタラクションレベルとシーンタイプを体系化したプロンプトスイートであるOmni-WorldSuite、および相互作用アクションの最終結果と中間状態進化軌道の両方に対する因果的影響を測定することで世界モデリング能力を定量化するエージェントベース評価フレームワークであるOmni-Metricsである。我々は複数のパラダイムに跨る18の代表的な世界モデルに対して広範な評価を実施した。分析により、現在の世界モデルがインタラクティブ応答性において抱える重大な限界が明らかになり、将来の研究に向けた実践的な示唆が得られた。Omni-WorldBenchは、インタラクティブな4D世界モデリングの進展を促進するため公開される。

English

Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench comprises two key components: Omni--WorldSuite, a systematic prompt suite spanning diverse interaction levels and scene types; and Omni--Metrics, an agent-based evaluation framework that quantifies world modeling capabilities by measuring the causal impact of interaction actions on both final outcomes and intermediate state evolution trajectories. We conduct extensive evaluations of 18 representative world models across multiple paradigms. Our analysis reveals critical limitations of current world models in interactive response, providing actionable insights for future research. Omni-WorldBench will be publicly released to foster progress in interactive 4D world modeling.

Omni-WorldBench: 世界モデルの包括的インタラクション中心評価に向けて

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

要旨

Support