ChatPaper.aiChatPaper

WBench:一个用于交互式视频世界模型评估的全面多轮基准测试

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

May 25, 2026
作者: Kaining Ying, Hengrui Hu, Siyu Ren, Jiamu Li, Fengjiao Chen, Ziwen Wang, Xuezhi Cao, Xunliang Cai, Henghui Ding
cs.AI

摘要

交互式世界模型正快速发展,然而现有基准仅覆盖部分所需能力,缺乏统一标准进行系统性评估。为此,我们提出WBench——一个面向交互式世界模型评估的多轮综合基准,从视频质量、设定遵循、交互遵循、一致性及物理合规性五个维度进行评测。WBench包含289个测试用例和1,058次交互轮次,每个用例指定一个世界设定及多轮交互序列,覆盖多样场景、风格、主体,以及第一人称和第三人称视角,同时包含导航、主体动作、事件编辑和视角切换四种交互类型。针对导航任务,WBench统一了文本、六自由度位姿和离散动作控制,支持评估具有不同原生输入接口的模型。评估采用22项自动子指标,结合了专业视觉模型与大型多模态模型,所有指标均经过人工判断验证。在20个最先进模型上的实验表明,没有单一模型能在所有维度上表现优异。我们提供了详细的诊断性分析,揭示各模型的特性优势、劣势与开放性挑战。代码与数据已开源至https://github.com/meituan-longcat/WBench。
English
Interactive world models are advancing rapidly, yet existing benchmarks cover only part of the required competencies, leaving no unified standard for systematic evaluation. To fill this gap, we introduce WBench, a comprehensive multi-turn benchmark for interactive world model evaluation along five dimensions, namely video quality, setting adherence, interaction adherence, consistency, and physics compliance. WBench contains 289 test cases and 1,058 interaction turns, where each case specifies a world setting and a multi-turn interaction sequence, covering diverse scenes, styles, subjects, and both first- and third-person perspectives, together with four interaction types, including navigation, subject action, event editing, and perspective switching. For navigation, WBench unifies text, 6-DoF pose, and discrete-action control, enabling evaluation of models with different native input interfaces. Evaluation uses 22 automatic sub-metrics that combine specialist vision models with large multimodal models, and all metrics are validated against human judgments. Across 20 state-of-the-art models, we find that no single model performs strongly across all dimensions. We provide detailed diagnostic insights into the characteristic strengths, weaknesses, and open challenges of each model. Code and data are available at https://github.com/meituan-longcat/WBench.