インタラクティブな世界モデルのベンチマークと統合的行動生成フレームワーク

要旨

人工汎用知能（AGI）の実現には、適応的に学習し相互作用するエージェントが不可欠であり、インタラクティブ世界モデルは知覚・推論・行動のためのスケーラブルな環境を提供する。しかし、現在の研究では、物理的相互作用能力を評価する大規模データセットと統一ベンチマークが依然として不足している。この問題に対処するため、我々は距離知覚や記憶などの相互作用関連能力に焦点を当てた世界モデルの学習と評価のための包括的ベンチマーク「iWorld-Bench」を提案する。33万本のビデオクリップからなる多様なデータセットを構築し、様々な視点・天候・場景を網羅した2,100の高品質サンプルを選定した。既存の世界モデルは相互作用モダリティが異なるため、評価を統一するアクション生成フレームワークを導入し、6種類のタスク類型を設計して4,900のテストサンプルを生成した。これらのタスクは、視覚的生成、軌道追従、記憶におけるモデル性能を総合的に評価する。代表的な14の世界モデルを評価した結果、主要な限界点を特定し、将来の研究に向けた知見を提供する。iWorld-BenchモデルリーダーボードはiWorld-Bench.comで公開されている。

English

Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual generation, trajectory following, and memory. Evaluating 14 representative world models, we identify key limitations and provide insights for future research. The iWorld-Bench model leaderboard is publicly available at iWorld-Bench.com.

インタラクティブな世界モデルのベンチマークと統合的行動生成フレームワーク

A Benchmark for Interactive World Models with a Unified Action Generation Framework

要旨

Support