通过下一个标记预测实现的上下文内模仿学习
In-Context Imitation Learning via Next-Token Prediction
August 28, 2024
作者: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg
cs.AI
摘要
我们探讨了如何增强下一个标记预测模型,以在真实机器人上执行上下文内模仿学习,其中机器人通过解释输入阶段提供的上下文信息来执行新任务,而无需更新其基础策略参数。我们提出了上下文机器人Transformer(ICRT),这是一个因果Transformer,对感知运动轨迹进行自回归预测,而无需依赖任何语言数据或奖励函数。这种表述使得在测试阶段通过使用新任务的感知运动轨迹(由图像观察、动作和状态元组组成,通过人类远程操作收集)提示模型,从而实现了灵活且无需训练的执行新任务。通过对Franka Emika机器人进行实验,结果表明ICRT可以适应由提示指定的新任务,即使在与提示和训练数据都不同的环境配置中也能胜任。在多任务环境设置中,ICRT在泛化到未见任务方面明显优于当前机器人学领域中最先进的下一个标记预测模型。代码、检查点和数据可在https://icrt.dev/ 上获取。
English
We explore how to enhance next-token prediction models to perform in-context
imitation learning on a real robot, where the robot executes new tasks by
interpreting contextual information provided during the input phase, without
updating its underlying policy parameters. We propose In-Context Robot
Transformer (ICRT), a causal transformer that performs autoregressive
prediction on sensorimotor trajectories without relying on any linguistic data
or reward function. This formulation enables flexible and training-free
execution of new tasks at test time, achieved by prompting the model with
sensorimotor trajectories of the new task composing of image observations,
actions and states tuples, collected through human teleoperation. Experiments
with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks
specified by prompts, even in environment configurations that differ from both
the prompt and the training data. In a multitask environment setup, ICRT
significantly outperforms current state-of-the-art next-token prediction models
in robotics on generalizing to unseen tasks. Code, checkpoints and data are
available on https://icrt.dev/Summary
AI-Generated Summary