AlphaStar Unplugged: 大規模オフライン強化学習

要旨

StarCraft IIは、最も挑戦的なシミュレーション強化学習環境の一つである。これは部分的に観測可能で、確率的であり、マルチエージェント環境であり、StarCraft IIをマスターするには、長期的な戦略的計画とリアルタイムの低レベル実行が要求される。また、活発なプロフェッショナル競技シーンも存在する。StarCraft IIは、オフライン強化学習アルゴリズムの進歩に特に適しており、その挑戦的な性質と、Blizzardが公開した人間プレイヤーによる数百万のStarCraft IIゲームの大規模なデータセットがその理由である。本論文では、これを活用し、AlphaStar Unpluggedと呼ばれるベンチマークを確立し、オフライン強化学習に前例のない課題を導入する。我々は、データセット（Blizzardの公開データの一部）、機械学習手法のためのAPIを標準化するツール、および評価プロトコルを定義する。また、行動クローニング、アクター・クリティックおよびMuZeroのオフライン変種を含むベースラインエージェントを提示する。我々は、オフラインデータのみを使用してエージェントの最先端を改善し、以前に公開されたAlphaStarの行動クローニングエージェントに対して90%の勝率を達成した。

English

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

AlphaStar Unplugged: 大規模オフライン強化学習

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

要旨

Support