Omni-WorldBench: 세계 모델을 위한 포괄적 상호작용 중심 평가 방향

초록

비디오 기반 월드 모델은 비디오 생성과 3D 재구성이라는 두 가지 주요 패러다임을 따라 발전해 왔습니다. 그러나 기존 평가 벤치마크는 생성 모델에 대한 시각적 충실도와 텍스트-비디오 정렬에만 집중하거나, 근본적으로 시간적 역동성을 간과하는 정적 3D 재구성 메트릭에 의존하는 한계가 있습니다. 우리는 월드 모델링의 미래가 공간 구조와 시간적 변화를 함께 모델링하는 4D 생성에 있다고 주장합니다. 이 패러다임에서 핵심 능력은 상호작용적 응답, 즉 상호작용 행동이 시공간에 걸쳐 상태 전이를 어떻게驱动하는지를 충실히 반영하는 능력입니다. 그러나 기존 벤치마크는 이 중요한 차원을 체계적으로 평가하지 못하고 있습니다. 이러한 격차를 해결하기 위해 우리는 4D 환경에서 월드 모델의 상호작용 응답 능력을 평가하기 위해 특별히 설계된 포괄적인 벤치마크인 Omni-WorldBench를 제안합니다. Omni-WorldBench는 두 가지 핵심 구성 요소로 이루어집니다: 다양한 상호작용 수준과 장면 유형을 포괄하는 체계적인 프롬프트 모음인 Omni-WorldSuite, 그리고 상호작용 행동이 최종 결과와 중간 상태 진화 궤적에 미치는 인과적 영향을 측정하여 월드 모델링 능력을 정량화하는 에이전트 기반 평가 프레임워크인 Omni-Metrics입니다. 우리는 여러 패러다임에 걸친 18개의 대표적인 월드 모델에 대한 광범위한 평가를 수행합니다. 우리의 분석은 현재 월드 모델의 상호작용 응답 능력에 대한 중요한 한계를 드러내며, 향후 연구를 위한 실질적인 통찰을 제공합니다. Omni-WorldBench는 상호작용적 4D 월드 모델링의 발전을 촉진하기 위해 공개될 예정입니다.

English

Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench comprises two key components: Omni--WorldSuite, a systematic prompt suite spanning diverse interaction levels and scene types; and Omni--Metrics, an agent-based evaluation framework that quantifies world modeling capabilities by measuring the causal impact of interaction actions on both final outcomes and intermediate state evolution trajectories. We conduct extensive evaluations of 18 representative world models across multiple paradigms. Our analysis reveals critical limitations of current world models in interactive response, providing actionable insights for future research. Omni-WorldBench will be publicly released to foster progress in interactive 4D world modeling.

Omni-WorldBench: 세계 모델을 위한 포괄적 상호작용 중심 평가 방향

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

초록

Support