ChatPaper.aiChatPaper

Q-Transformer:通过自回归Q函数实现可扩展的离线强化学习

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

September 18, 2023
作者: Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine
cs.AI

摘要

在这项工作中,我们提出了一种可扩展的强化学习方法,用于从大型离线数据集中训练多任务策略,可以利用人类示范和自主收集的数据。我们的方法使用Transformer来提供一个可扩展的表示,用于通过离线时间差分备份训练的Q函数。因此,我们将该方法称为Q-Transformer。通过将每个动作维度离散化,并将每个动作维度的Q值表示为单独的标记,我们可以应用有效的高容量序列建模技术进行Q学习。我们提出了几个设计决策,以实现离线RL训练的良好性能,并展示了Q-Transformer在大型多样化的真实世界机器人操作任务套件上优于先前的离线RL算法和模仿学习技术。项目的网站和视频可在https://q-transformer.github.io找到。
English
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io
PDF251December 15, 2024