RACER：富有语言指导的模仿学习失败恢复策略

摘要

针对机器人操作中缺乏自我恢复机制和简单语言指令限制的挑战，开发稳健且可校正的视觉运动策略具有一定难度。为解决这些问题，我们提出了一种可扩展的数据生成流程，通过自动将专家演示与故障恢复轨迹和细粒度语言注释相结合进行训练。随后，我们引入了富语言引导的故障恢复（RACER）框架，该框架结合了故障恢复数据和丰富的语言描述以增强机器人控制。RACER具有一个视觉语言模型（VLM），作为在线监督者，提供详细的语言指导进行错误校正和任务执行，以及一个以语言为条件的视觉运动策略作为执行者，用于预测下一步动作。我们的实验结果表明，在RLbench上，RACER在各种评估设置下均优于最先进的Robotic View Transformer（RVT），包括标准的长视程任务、动态目标更改任务和零样本未见任务，在模拟和真实环境中均实现了卓越的性能。视频和代码可在以下网址获取：https://rich-language-failure-recovery.github.io。

English

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.

RACER：富有语言指导的模仿学习失败恢复策略

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

摘要

Support