AlphaStar Unplugged: 大規模離線強化學習

摘要

《星海爭霸II》是其中一個最具挑戰性的模擬強化學習環境；它是部分可觀察的、隨機的、多智能體的，要精通《星海爭霸II》需要長時間規劃策略，同時實時執行低層級操作。它也擁有活躍的專業競技場景。《星海爭霸II》非常適合推進離線強化學習算法的發展，這既因為它的挑戰性質，也因為暴雪公司釋出了數百萬場人類玩家參與的《星海爭霸II》遊戲數據集。本文利用這一點，建立了一個名為AlphaStar Unplugged的基準，為離線強化學習引入了前所未有的挑戰。我們定義了一個數據集（暴雪公司釋出的子集），標準化機器學習方法的API工具，以及一個評估協議。我們還提出了基準代理，包括行為克隆、演員-評論家的離線變體和MuZero。我們僅使用離線數據改進了代理的最新技術水平，並且在對先前發表的AlphaStar行為克隆代理取得了90%的勝率。

English

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

AlphaStar Unplugged: 大規模離線強化學習

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

摘要

Support