AlphaStar Unplugged: 大規模離線強化學習
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
August 7, 2023
作者: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
cs.AI
摘要
《星海爭霸II》是其中一個最具挑戰性的模擬強化學習環境;它是部分可觀察的、隨機的、多智能體的,要精通《星海爭霸II》需要長時間規劃策略,同時實時執行低層級操作。它也擁有活躍的專業競技場景。《星海爭霸II》非常適合推進離線強化學習算法的發展,這既因為它的挑戰性質,也因為暴雪公司釋出了數百萬場人類玩家參與的《星海爭霸II》遊戲數據集。本文利用這一點,建立了一個名為AlphaStar Unplugged的基準,為離線強化學習引入了前所未有的挑戰。我們定義了一個數據集(暴雪公司釋出的子集),標準化機器學習方法的API工具,以及一個評估協議。我們還提出了基準代理,包括行為克隆、演員-評論家的離線變體和MuZero。我們僅使用離線數據改進了代理的最新技術水平,並且在對先前發表的AlphaStar行為克隆代理取得了90%的勝率。
English
StarCraft II is one of the most challenging simulated reinforcement learning
environments; it is partially observable, stochastic, multi-agent, and
mastering StarCraft II requires strategic planning over long time horizons with
real-time low-level execution. It also has an active professional competitive
scene. StarCraft II is uniquely suited for advancing offline RL algorithms,
both because of its challenging nature and because Blizzard has released a
massive dataset of millions of StarCraft II games played by human players. This
paper leverages that and establishes a benchmark, called AlphaStar Unplugged,
introducing unprecedented challenges for offline reinforcement learning. We
define a dataset (a subset of Blizzard's release), tools standardizing an API
for machine learning methods, and an evaluation protocol. We also present
baseline agents, including behavior cloning, offline variants of actor-critic
and MuZero. We improve the state of the art of agents using only offline data,
and we achieve 90% win rate against previously published AlphaStar behavior
cloning agent.