MOMAland:多目標多智能體強化學習的基準集
MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning
July 23, 2024
作者: Florian Felten, Umut Ucak, Hicham Azmani, Gao Peng, Willem Röpke, Hendrik Baier, Patrick Mannion, Diederik M. Roijers, Jordan K. Terry, El-Ghazali Talbi, Grégoire Danoy, Ann Nowé, Roxana Rădulescu
cs.AI
摘要
許多具有挑戰性的任務,如管理交通系統、電力網格或供應鏈,涉及複雜的決策過程,必須平衡多個相互衝突的目標並協調各種獨立決策者(DMs)的行動。一種形式化和應對此類任務的觀點是多目標多智能體強化學習(MOMARL)。MOMARL將強化學習(RL)擴展到需要考慮多個目標的多個智能體的問題中。在強化學習研究中,基準測試對於促進進展、評估和可重複性至關重要。基準測試的重要性凸顯了為各種RL範式開發的眾多基準框架的存在,包括單智能體RL(例如Gymnasium)、多智能體RL(例如PettingZoo)和單智能體多目標RL(例如MO-Gymnasium)。為了支持MOMARL領域的進展,我們介紹了MOMAland,這是第一個為多目標多智能體強化學習提供標準化環境的集合。MOMAland滿足了這一新興領域中全面基準測試的需求,提供了超過10個不同的環境,這些環境在智能體數量、狀態表示、獎勵結構和效用考慮方面各不相同。為了為未來研究提供強大的基準線,MOMAland還包括能夠在這些環境中學習策略的算法。
English
Many challenging tasks such as managing traffic systems, electricity grids,
or supply chains involve complex decision-making processes that must balance
multiple conflicting objectives and coordinate the actions of various
independent decision-makers (DMs). One perspective for formalising and
addressing such tasks is multi-objective multi-agent reinforcement learning
(MOMARL). MOMARL broadens reinforcement learning (RL) to problems with multiple
agents each needing to consider multiple objectives in their learning process.
In reinforcement learning research, benchmarks are crucial in facilitating
progress, evaluation, and reproducibility. The significance of benchmarks is
underscored by the existence of numerous benchmark frameworks developed for
various RL paradigms, including single-agent RL (e.g., Gymnasium), multi-agent
RL (e.g., PettingZoo), and single-agent multi-objective RL (e.g.,
MO-Gymnasium). To support the advancement of the MOMARL field, we introduce
MOMAland, the first collection of standardised environments for multi-objective
multi-agent reinforcement learning. MOMAland addresses the need for
comprehensive benchmarking in this emerging field, offering over 10 diverse
environments that vary in the number of agents, state representations, reward
structures, and utility considerations. To provide strong baselines for future
research, MOMAland also includes algorithms capable of learning policies in
such settings.