RoboCat：用于机器人操作的自我改进基础代理

摘要

利用来自不同机器人和任务的异构机器人经验迅速掌握新技能和实体的能力有可能改变机器人学习。受到视觉和语言基础模型的最新进展的启发，我们提出了一个用于机器人操作的基础代理。这个代理被命名为RoboCat，是一个视觉目标条件的决策变换器，能够处理多实体动作标记的视觉经验。这些数据涵盖了来自模拟和真实机器人手臂的大量运动控制技能，观察和动作集各异。通过RoboCat，我们展示了其能够泛化到新任务和机器人，包括零样本学习以及仅使用100-1000个示例进行目标任务的适应。我们还展示了如何使用训练好的模型生成数据以供后续训练迭代使用，从而为自主改进循环提供了一个基本构建模块。我们研究了代理的能力，在模拟环境和三种不同真实机器人实体上进行了大规模评估。我们发现，随着训练数据的增长和多样化，RoboCat不仅显示出跨任务迁移的迹象，而且在适应新任务时变得更加高效。

English

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100--1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

RoboCat：用于机器人操作的自我改进基础代理

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

摘要

Support