ChatPaper.aiChatPaper

灵巧精准:GR-RL算法实现长周期机器人操控新突破

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

December 1, 2025
作者: Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu
cs.AI

摘要

我们提出GR-RL机器人学习框架,该框架能将通用视觉-语言-动作(VLA)策略转化为擅长长周期精细操作的专家系统。现有VLA策略的核心假设是人类示范具有最优性,但我们发现对于高精度灵巧操作任务,人类示范存在噪声且并非最优。GR-RL通过多阶段训练流程,采用强化学习对示范数据进行筛选、增强与优化:首先学习视觉语言条件化的任务进度函数,过滤示范轨迹并仅保留对进度有积极贡献的状态转移。具体而言,我们证明直接应用稀疏奖励的离线强化学习时,所得Q值可作为鲁棒的进度评估函数。其次,引入形态对称性增强方法,显著提升GR-RL的泛化能力与性能。最后,为实现高精度控制下VLA策略与部署行为更好对齐,通过训练潜空间噪声预测器进行在线强化学习。该框架使GR-RL成为首个能自主完成穿鞋系带任务的学习型策略——以83.3%的成功率将鞋带依次穿过多个鞋眼,此任务需长周期推理、毫米级精度及柔性体交互能力。我们期待GR-RL为通用机器人基础模型向可靠现实场景专家的转化提供新路径。
English
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting Q-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.
PDF171December 3, 2025