ChatPaper.aiChatPaper

透過下一個標記預測的方式進行上下文中的模仿學習

In-Context Imitation Learning via Next-Token Prediction

August 28, 2024
作者: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg
cs.AI

摘要

我們探討如何增強下一個令牌預測模型,以在真實機器人上執行上下文模仿學習,其中機器人通過解釋輸入階段提供的上下文信息來執行新任務,而無需更新其基礎策略參數。我們提出了上下文機器人變壓器(ICRT),這是一種因果變壓器,對感知運動軌跡進行自回歸預測,而無需依賴任何語言數據或獎勵函數。這種形式使得模型可以在測試時靈活且無需訓練地執行新任務,通過提示模型使用由圖像觀察、動作和狀態元組組成的新任務的感知運動軌跡,這些軌跡是通過人類遠程操作收集的。通過對Franka Emika機器人進行實驗,證明了ICRT可以適應由提示指定的新任務,即使在與提示和訓練數據都不同的環境配置中也能適應。在多任務環境設置中,ICRT在泛化到未見任務方面顯著優於當前機器人領域中最先進的下一個令牌預測模型。代碼、檢查點和數據可在https://icrt.dev/ 上獲得。
English
We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/

Summary

AI-Generated Summary

PDF103November 16, 2024