Q-Transformer: 自己回帰型Q関数によるスケーラブルなオフライン強化学習

要旨

本研究では、人間によるデモンストレーションと自律的に収集されたデータの両方を活用可能な、大規模オフラインデータセットからのマルチタスクポリシー学習に向けたスケーラブルな強化学習手法を提案します。本手法は、Transformerを用いてオフライン時間差分バックアップで学習されるQ関数のスケーラブルな表現を提供します。このため、本手法をQ-Transformerと呼びます。各行動次元を離散化し、各行動次元のQ値を個別のトークンとして表現することで、Q学習に対して効果的な高容量シーケンスモデリング技術を適用可能にします。オフライン強化学習トレーニングにおいて良好な性能を実現するためのいくつかの設計上の決定を示し、Q-Transformerが大規模で多様な実世界のロボット操作タスクスイートにおいて、従来のオフライン強化学習アルゴリズムや模倣学習技術を上回ることを示します。プロジェクトのウェブサイトと動画はhttps://q-transformer.github.ioで閲覧可能です。

English

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io

Q-Transformer: 自己回帰型Q関数によるスケーラブルなオフライン強化学習

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

要旨

Support