GR-RL：面向长周期机器人操作的灵巧精准策略

摘要

我们提出GR-RL这一机器人学习框架，它能将通用视觉-语言-动作策略转化为擅长长时序灵巧操作任务的专家级系统。现有VLA策略的核心假设是人类示范具有最优性，但我们发现在高精度灵巧操作任务中，人类示范实际存在噪声且非最优。GR-RL通过多阶段训练流程，利用强化学习对示范数据进行筛选、增强与强化：首先学习视觉语言条件化的任务进度函数，筛选示范轨迹并仅保留对进度有积极贡献的状态转移。具体而言，我们证明直接应用稀疏奖励的离线强化学习时，所得Q值可作为鲁棒的进度评估函数；其次引入形态对称增强技术，显著提升GR-RL的泛化能力与性能；最后通过隐空间噪声预测器进行在线强化学习，使VLA策略与其部署行为在高精度控制任务中更好对齐。该框架实现了基于学习的策略在穿鞋带任务中的突破——能够自主将鞋带依次穿过多个鞋孔，成功率高达83.3%，此任务需具备长时序推理、毫米级精度及柔顺的软体交互能力。我们期待GR-RL为通用机器人基础模型向现实世界可靠专家的转化提供新思路。

English

We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting Q-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.

GR-RL：面向长周期机器人操作的灵巧精准策略

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

摘要

Support