Q-Transformer：通过自回归Q函数实现可扩展的离线强化学习

摘要

在这项工作中，我们提出了一种可扩展的强化学习方法，用于从大型离线数据集中训练多任务策略，可以利用人类示范和自主收集的数据。我们的方法使用Transformer来提供一个可扩展的表示，用于通过离线时间差分备份训练的Q函数。因此，我们将该方法称为Q-Transformer。通过将每个动作维度离散化，并将每个动作维度的Q值表示为单独的标记，我们可以应用有效的高容量序列建模技术进行Q学习。我们提出了几个设计决策，以实现离线RL训练的良好性能，并展示了Q-Transformer在大型多样化的真实世界机器人操作任务套件上优于先前的离线RL算法和模仿学习技术。项目的网站和视频可在https://q-transformer.github.io找到。

English

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io

Q-Transformer：通过自回归Q函数实现可扩展的离线强化学习

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

摘要

Support