WEAVER：更好、更快、更持久——一种有效的机器人操作世界模型

摘要

世界模型（WMs，即学习模拟器）对机器人技术的潜在影响深远——策略评估、策略改进以及测试时规划——所有这些都只需有限的真实世界交互。要解锁这些下游能力，世界模型需要同时满足三个期望：(i) 保真度（即生成与真实情况相关的模拟轨迹），(ii) 一致性（即产生在长时间跨度上连贯的模拟轨迹），以及(iii) 效率（即快速生成模拟轨迹）。我们提出WEAVER（具身推理的多视角世界估计）：一种同时实现所有三个期望的世界模型架构，在机器人操作任务上取得了最先进的结果。WEAVER是一个多视角世界模型，通过流匹配损失训练来预测未来潜变量和奖励值。我们提炼了模型架构、记忆和预测目标中的关键设计决策，这些决策是解锁之前世界建模方法难以处理的长期动态操控任务所必需的。我们将WEAVER应用于机器人硬件，证明了其在策略评估（与真实世界成功率的相关系数ρ=0.870）、策略改进（在π_{0.5}机器人基础模型基础上实现38%的真实世界成功率提升）以及测试时规划（相比之前的世界模型，真实世界成功率提升14%，速度提升5-10倍）方面的有效性。在分布外场景下评估时，WEAVER也表现出优于先前世界模型的性能。代码、模型和视频见：https://arnavkj1995.github.io/WEAVER/。

English

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: (i) fidelity (i.e., producing simulated trajectories that correlate with reality), (ii) consistency (i.e., producing simulated trajectories that are coherent over long horizons), and (iii) efficiency (i.e., producing simulated trajectories quickly). We propose WEAVER (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. WEAVER is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of long-horizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply WEAVER in robotic hardware, demonstrating its effectiveness at policy evaluation (ρ=0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of 38% on top of the π_{0.5} robot foundation model), and test-time planning (real-world success rate improvement of 14% with a 5-10times speedup over prior WMs). WEAVER also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/ .