ChatPaper.aiChatPaper

RLinf-Co:基于强化学习的虚实协同训练框架在视觉语言动作模型中的应用

RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

February 13, 2026
作者: Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zhang, Weinan Zhang, Chao Yu, Yu Wang
cs.AI

摘要

仿真技术为丰富视觉-语言-动作模型的训练提供了可扩展且低成本的途径,降低了对昂贵真实机器人演示数据的依赖。然而,多数仿真-现实协同训练方法依赖于监督微调,仅将仿真视为静态演示数据源,未能充分利用大规模闭环交互。这导致现实场景的性能提升和泛化能力往往受限。本文提出一种基于强化学习的仿真-现实协同训练框架,在保持现实世界能力的同时充分利用交互式仿真优势。该方法采用通用的两阶段设计:首先通过真实与仿真演示数据的混合监督微调对策略进行预热初始化,随后在仿真环境中进行强化学习微调,并针对真实数据添加辅助监督损失以锚定策略、避免灾难性遗忘。我们在四种现实桌面操作任务上,使用OpenVLA和π_{0.5}两种代表性视觉-语言-动作架构进行评估,发现该方法相较纯真实数据微调和基于监督微调的协同训练均取得持续改进:OpenVLA实现现实任务成功率提升24%,π_{0.5}提升20%。除成功率提升外,强化学习协同训练还展现出对未见任务变体更强的泛化能力,并显著提高现实数据利用效率,为借助仿真技术增强真实机器人部署提供了实用且可扩展的路径。
English
Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \textit{RL}-based sim-real \textit{Co}-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and π_{0.5}, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on π_{0.5}. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.
PDF92February 17, 2026