BeTAIL：從人類競速遊戲中學習的行為轉換器對抗模仿學習

摘要

模仿學習是從示範中學習策略，而無需手動設計獎勵函數。在許多機器人任務中，如自主賽車，模仿的策略必須建模複雜的環境動態和人類決策。序列建模在捕捉運動序列的細微模式方面非常有效，但在適應新環境或分布轉移方面卻遇到困難，這在真實世界的機器人任務中很常見。相比之下，對抗式模仿學習（AIL）可以緩解這種影響，但在樣本效率和處理複雜運動模式方面卻遇到困難。因此，我們提出了BeTAIL：行為轉換器對抗式模仿學習，它將來自人類示範的行為轉換器（BeT）策略與在線AIL相結合。BeTAIL將一個AIL剩餘策略添加到BeT策略中，以模擬人類專家的順序決策過程，並對分布外狀態或環境動態的變化進行校正。我們在三個具有Gran Turismo Sport真實人類遊戲示範的挑戰上測試了BeTAIL。我們提出的剩餘BeTAIL減少了環境交互作用，提高了賽車表現和穩定性，即使BeT是在不同賽道上預先訓練的，也能改善下游學習。視頻和代碼可在以下網址找到：https://sites.google.com/berkeley.edu/BeTAIL/home。

English

Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.

BeTAIL：從人類競速遊戲中學習的行為轉換器對抗模仿學習

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

摘要

Support