JaxMARL: Multi-Agent RL-omgevingen in JAX

Samenvatting

Benchmarks spelen een cruciale rol in de ontwikkeling van machine learning-algoritmen. Onderzoek op het gebied van reinforcement learning (RL) is bijvoorbeeld sterk beïnvloed door beschikbare omgevingen en benchmarks. Traditioneel worden RL-omgevingen echter op de CPU uitgevoerd, wat hun schaalbaarheid beperkt met typische academische rekenkracht. Recente vooruitgang in JAX heeft het bredere gebruik van hardwareversnelling mogelijk gemaakt om deze computationele barrières te overwinnen, waardoor massaal parallelle RL-trainingspijplijnen en omgevingen mogelijk worden. Dit is vooral nuttig voor onderzoek naar multi-agent reinforcement learning (MARL). Ten eerste moeten meerdere agenten bij elke omgevingsstap worden overwogen, wat de computationele belasting verhoogt, en ten tweede neemt de samplecomplexiteit toe door niet-stationariteit, gedecentraliseerde gedeeltelijke observeerbaarheid of andere MARL-uitdagingen. In dit artikel presenteren we JaxMARL, de eerste open-source codebase die gebruiksvriendelijkheid combineert met GPU-gestuurde efficiëntie, en die een groot aantal veelgebruikte MARL-omgevingen ondersteunt, evenals populaire baseline-algoritmen. Wat betreft wall clock-tijd laten onze experimenten zien dat onze op JAX gebaseerde trainingspijplijn per run tot 12500x sneller is dan bestaande benaderingen. Dit maakt efficiënte en grondige evaluaties mogelijk, met het potentieel om de evaluatiecrisis in het veld te verlichten. We introduceren en benchmarken ook SMAX, een gevectoriseerde, vereenvoudigde versie van de populaire StarCraft Multi-Agent Challenge, waardoor de noodzaak om de StarCraft II-game-engine te draaien wordt weggenomen. Dit maakt niet alleen GPU-versnelling mogelijk, maar biedt ook een flexibelere MARL-omgeving, wat de deur opent voor zelfspel, meta-leren en andere toekomstige toepassingen in MARL. We bieden de code aan op https://github.com/flairox/jaxmarl.

English

Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.

JaxMARL: Multi-Agent RL-omgevingen in JAX

JaxMARL: Multi-Agent RL Environments in JAX

Samenvatting

Support