WEAVER、より良く、より速く、より長く：ロボットマニピュレーションのための効果的な世界モデル

要旨

世界モデル（WM、すなわち学習されたシミュレータ）がロボティクスに与える潜在的な影響は計り知れません。政策評価、政策改善、テスト時計画など、限られた実世界との相互作用で実現できる可能性があります。これらの下流能力を引き出すためには、WMが次の3つの要件を同時に満たす必要があります。(i) 忠実性（現実と相関するシミュレーション軌道を生成すること）、(ii) 一貫性（長期的に首尾一貫したシミュレーション軌道を生成すること）、(iii) 効率性（シミュレーション軌道を迅速に生成すること）。本稿では、これら3つの要件をすべて同時に達成し、ロボット操作タスクにおいて最先端の成果を提供するWMアーキテクチャであるWEAVER（World Estimation Across Views for Embodied Reasoning）を提案します。WEAVERは、フローマッチング損失を用いて将来の潜在変数と報酬値を予測するように訓練されたマルチビューWMです。これまでの世界モデリング手法では困難であった長期的な動的操作タスクを実現するために必要な、モデルアーキテクチャ、メモリ、予測目的に関する主要な設計上の決定を抽出します。我々はWEAVERをロボットハードウェアに適用し、政策評価（実世界成功率との相関ρ=0.870）、政策改善（π_{0.5}ロボット基盤モデル上で実世界成功率が38%向上）、テスト時計画（従来のWMと比較して5～10倍の高速化で実世界成功率が14%向上）における有効性を実証しました。また、WEAVERは分布外シナリオで評価した場合にも、従来のWMよりも優れた性能を示します。コード、モデル、動画は https://arnavkj1995.github.io/WEAVER/ で入手できます。

English

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: (i) fidelity (i.e., producing simulated trajectories that correlate with reality), (ii) consistency (i.e., producing simulated trajectories that are coherent over long horizons), and (iii) efficiency (i.e., producing simulated trajectories quickly). We propose WEAVER (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. WEAVER is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of long-horizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply WEAVER in robotic hardware, demonstrating its effectiveness at policy evaluation (ρ=0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of 38% on top of the π_{0.5} robot foundation model), and test-time planning (real-world success rate improvement of 14% with a 5-10times speedup over prior WMs). WEAVER also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/ .