WBench:一個用於評估互動式視頻世界模型的全面多輪基準
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
May 25, 2026
作者: Kaining Ying, Hengrui Hu, Siyu Ren, Jiamu Li, Fengjiao Chen, Ziwen Wang, Xuezhi Cao, Xunliang Cai, Henghui Ding
cs.AI
摘要
互動世界模型正快速進步,然而現有基準僅涵蓋部分必要能力,缺乏系統性評估的統一標準。為填補此缺口,我們提出 WBench——一套全面的多輪次互動世界模型評估基準,針對影片品質、設定遵循度、互動遵循度、一致性及物理合規性五大維度進行評測。WBench 包含 289 個測試案例與 1,058 次互動輪次,每個案例指定一個世界設定與多輪互動序列,涵蓋多樣場景、風格、主體,以及第一人稱與第三人稱視角,並包含導航、主體動作、事件編輯與視角切換四種互動類型。在導航方面,WBench 統合文字、六自由度姿態與離散動作控制,使具備不同原生輸入介面的模型皆可受評。評估使用 22 項自動化子指標,結合專業視覺模型與大型多模态模型,所有指標均經人類判斷驗證。在 20 個最新模型中,我們發現沒有任何單一模型在所有維度上表現皆佳。我們提供詳細診斷見解,闡明各模型的特性優勢、弱點及尚待解決的挑戰。程式碼與資料可於 https://github.com/meituan-longcat/WBench 取得。
English
Interactive world models are advancing rapidly, yet existing benchmarks cover only part of the required competencies, leaving no unified standard for systematic evaluation. To fill this gap, we introduce WBench, a comprehensive multi-turn benchmark for interactive world model evaluation along five dimensions, namely video quality, setting adherence, interaction adherence, consistency, and physics compliance. WBench contains 289 test cases and 1,058 interaction turns, where each case specifies a world setting and a multi-turn interaction sequence, covering diverse scenes, styles, subjects, and both first- and third-person perspectives, together with four interaction types, including navigation, subject action, event editing, and perspective switching. For navigation, WBench unifies text, 6-DoF pose, and discrete-action control, enabling evaluation of models with different native input interfaces. Evaluation uses 22 automatic sub-metrics that combine specialist vision models with large multimodal models, and all metrics are validated against human judgments. Across 20 state-of-the-art models, we find that no single model performs strongly across all dimensions. We provide detailed diagnostic insights into the characteristic strengths, weaknesses, and open challenges of each model. Code and data are available at https://github.com/meituan-longcat/WBench.