通过下一个标记预测实现的上下文内模仿学习

摘要

我们探讨了如何增强下一个标记预测模型，以在真实机器人上执行上下文内模仿学习，其中机器人通过解释输入阶段提供的上下文信息来执行新任务，而无需更新其基础策略参数。我们提出了上下文机器人Transformer（ICRT），这是一个因果Transformer，对感知运动轨迹进行自回归预测，而无需依赖任何语言数据或奖励函数。这种表述使得在测试阶段通过使用新任务的感知运动轨迹（由图像观察、动作和状态元组组成，通过人类远程操作收集）提示模型，从而实现了灵活且无需训练的执行新任务。通过对Franka Emika机器人进行实验，结果表明ICRT可以适应由提示指定的新任务，即使在与提示和训练数据都不同的环境配置中也能胜任。在多任务环境设置中，ICRT在泛化到未见任务方面明显优于当前机器人学领域中最先进的下一个标记预测模型。代码、检查点和数据可在https://icrt.dev/ 上获取。

English

We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/

通过下一个标记预测实现的上下文内模仿学习

In-Context Imitation Learning via Next-Token Prediction

摘要

Support