Q-Transformer：通過自回歸Q函數實現可擴展的離線強化學習

摘要

在這份工作中，我們提出了一種可擴展的強化學習方法，用於從大型離線數據集中訓練多任務策略，可以利用人類示範和自主收集的數據。我們的方法使用Transformer提供可擴展的表示，用於通過離線時間差備份訓練的Q函數。因此，我們將該方法稱為Q-Transformer。通過將每個動作維度離散化，並將每個動作維度的Q值表示為獨立標記，我們可以應用有效的高容量序列建模技術進行Q學習。我們提出了幾個設計決策，以實現良好的離線強化學習訓練性能，並展示Q-Transformer在大型多樣的現實世界機器人操作任務套件上優於先前的離線強化學習算法和模仿學習技術。有關該項目的網站和視頻可在https://q-transformer.github.io 找到。

English

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io

Q-Transformer：通過自回歸Q函數實現可擴展的離線強化學習

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

摘要

Support