BeTAIL：来自人类赛车游戏的行为转换器对抗模仿学习

摘要

模仿学习是从示范中学习策略，而无需手动设计奖励函数。在许多机器人任务中，如自主赛车，模仿的策略必须对复杂的环境动态和人类决策建模。序列建模在捕捉运动序列的复杂模式方面非常有效，但在适应新环境或分布转移方面表现不佳，而这在现实世界的机器人任务中很常见。相比之下，对抗模仿学习（AIL）可以缓解这种影响，但在样本效率和处理复杂运动模式方面存在困难。因此，我们提出了BeTAIL：行为变换器对抗模仿学习，它将来自人类示范的行为变换器（BeT）策略与在线AIL相结合。BeTAIL在BeT策略中添加了一个AIL残差策略，以建模人类专家的顺序决策过程，并纠正分布外状态或环境动态的变化。我们在三个挑战中测试了BeTAIL，使用了《极限竞速体育》中真实人类游戏示范的专家级别。我们提出的残差BeTAIL减少了环境交互，并提高了赛车性能和稳定性，即使BeT是在不同赛道上进行预训练的也是如此。视频和代码可在以下网址找到：https://sites.google.com/berkeley.edu/BeTAIL/home。

English

Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.

BeTAIL：来自人类赛车游戏的行为转换器对抗模仿学习

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

摘要

Support