RISE:基於組合世界模型的機器人策略自我優化系統
RISE: Self-Improving Robot Policy with Compositional World Model
February 11, 2026
作者: Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li
cs.AI
摘要
尽管模型容量与数据获取能力持续提升,视觉-语言-动作模型在接触密集型动态操作任务中仍显脆弱——细微的执行偏差会累积导致任务失败。虽然强化学习为提升鲁棒性提供了理论路径,但物理世界中的同策略强化学习受限于安全风险、硬件成本与环境重置难题。为弥合这一鸿沟,我们提出RISE框架:基于想象机制的机器人强化学习可扩展方案。其核心是组合式世界模型,该模型具备双重功能:(i)通过可控动力学模型预测多视角未来状态;(ii)利用进程价值模型评估想象结果,为策略改进生成信息量丰富的优势函数。这种组合设计使得状态与价值评估能采用最适合且相互独立的架构与目标函数。这些组件被整合至闭环自优化流程中,可持续生成虚拟推演、估算优势函数,并在虚拟空间更新策略,无需耗费成本的物理交互。在三大具挑战性的现实任务中,RISE相较现有技术实现显著提升:动态积木分拣任务绝对性能提高35%以上,背包整理任务提升45%,箱体关闭任务提升35%。
English
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.