JaxMARL: JAX 기반 다중 에이전트 강화학습 환경

초록

벤치마크는 머신러닝 알고리즘 개발에 중요한 역할을 합니다. 예를 들어, 강화학습(RL) 연구는 사용 가능한 환경과 벤치마크에 크게 영향을 받아 왔습니다. 그러나 전통적으로 RL 환경은 CPU에서 실행되어 일반적인 학계의 컴퓨팅 자원으로는 확장성이 제한적이었습니다. 최근 JAX의 발전으로 이러한 계산적 한계를 극복하기 위한 하드웨어 가속의 활용이 확대되면서, 대규모 병렬 RL 훈련 파이프라인과 환경이 가능해졌습니다. 이는 특히 다중 에이전트 강화학습(MARL) 연구에 유용합니다. 첫째, 각 환경 단계에서 여러 에이전트를 고려해야 하므로 계산 부담이 증가하며, 둘째, 비정상성, 분산된 부분 관측 가능성 또는 기타 MARL 과제로 인해 샘플 복잡성이 증가합니다. 본 논문에서는 사용 편의성과 GPU 기반 효율성을 결합하고, 다양한 일반적인 MARL 환경과 인기 있는 베이스라인 알고리즘을 지원하는 첫 번째 오픈소스 코드베이스인 JaxMARL을 소개합니다. 실제 실행 시간을 고려할 때, 우리의 실험은 JAX 기반 훈련 파이프라인이 기존 접근 방식보다 최대 12500배 빠르다는 것을 보여줍니다. 이는 효율적이고 철저한 평가를 가능하게 하여, 해당 분야의 평가 위기를 완화할 잠재력을 가지고 있습니다. 또한, 우리는 인기 있는 StarCraft 다중 에이전트 도전 과제의 벡터화된 간소화 버전인 SMAX를 소개하고 벤치마크를 제공합니다. 이는 StarCraft II 게임 엔진을 실행할 필요를 없애 GPU 가속을 가능하게 할 뿐만 아니라, 더 유연한 MARL 환경을 제공하여 자기 대결, 메타러닝 및 기타 미래의 MARL 응용 프로그램의 잠재력을 열어줍니다. 코드는 https://github.com/flairox/jaxmarl에서 제공됩니다.

English

Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.

JaxMARL: JAX 기반 다중 에이전트 강화학습 환경

JaxMARL: Multi-Agent RL Environments in JAX

초록

Support