ChatPaper.aiChatPaper

AlphaStar 不插电:大规模离线强化学习

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

August 7, 2023
作者: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
cs.AI

摘要

《星际争霸II》是最具挑战性的模拟强化学习环境之一;它是部分可观察的、随机的、多智能体的,掌握《星际争霸II》需要在实时低层执行中进行长期战略规划。它还拥有活跃的专业竞技场景。《星际争霸II》非常适合推动离线强化学习算法的发展,既因为其具有挑战性,也因为暴雪公司发布了数百万局人类玩家对战的《星际争霸II》数据集。本文利用这一点,建立了一个名为AlphaStar Unplugged的基准,为离线强化学习引入了前所未有的挑战。我们定义了一个数据集(暴雪发布的子集),标准化机器学习方法的API工具,以及评估协议。我们还提出了基线智能体,包括行为克隆、演员-评论家的离线变体和MuZero。我们仅利用离线数据改进了智能体的最新技术水平,并在与先前发布的AlphaStar行为克隆智能体对战中获得了90%的胜率。
English
StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.
PDF270December 15, 2024