ChatPaper.aiChatPaper

JaxMARL:JAX中的多智能体强化学习环境

JaxMARL: Multi-Agent RL Environments in JAX

November 16, 2023
作者: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
cs.AI

摘要

基准测试在机器学习算法的发展中发挥着重要作用。例如,强化学习(RL)领域的研究受到可用环境和基准测试的深刻影响。然而,RL环境传统上在CPU上运行,限制了其在典型学术计算中的可扩展性。JAX的最新进展使得更广泛地利用硬件加速来克服这些计算障碍成为可能,实现了大规模并行RL训练管线和环境。这对多智体强化学习(MARL)研究尤为重要。首先,在每个环境步骤中必须考虑多个智体,增加了计算负担;其次,由于非稳态性、分散式部分可观测性或其他MARL挑战,样本复杂度也增加了。在本文中,我们介绍了JaxMARL,这是第一个结合易用性与GPU加速效率的开源代码库,支持大量常用的MARL环境以及流行的基准算法。从挂钟时间的角度来看,我们的实验表明,基于JAX的训练管线每次运行比现有方法快高达12500倍。这使得能够进行高效而彻底的评估,有望缓解该领域的评估危机。我们还介绍并对SMAX进行基准测试,这是流行的星际争霸多智体挑战的矢量化简化版本,无需运行星际争霸II游戏引擎。这不仅实现了GPU加速,还提供了更灵活的MARL环境,为自我对弈、元学习和其他未来MARL应用释放了潜力。我们的代码可在https://github.com/flairox/jaxmarl找到。
English
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.
PDF80December 15, 2024