SRMT:多智能體終身路徑規劃的共享記憶
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
January 22, 2025
作者: Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev
cs.AI
摘要
多智能體強化學習(MARL)在各種環境中解決合作和競爭性多智能體問題方面取得了顯著進展。MARL面臨的主要挑戰之一是需要明確預測智能體的行為以實現合作。為了解決這個問題,我們提出了共享遞歸記憶轉換器(SRMT),它將記憶轉換器擴展到多智能體設置中,通過汇集和全局廣播個別的工作記憶,使智能體能夠隱式交換信息並協調其行動。我們在部分可觀察多智能體路徑規劃問題中對SRMT進行評估,在這個玩具 Eng Bottleneck 導航任務中,智能體需要通過狹窄走廊,以及在 POGEMA 基準任務集上進行評估。在 Bottleneck 任務中,SRMT在各種強化學習基準中表現出色,特別是在稀疏獎勵下,並且能夠有效地推廣到比訓練中見過的更長走廊。在包括迷宮、隨機和 MovingAI 在內的 POGEMA 地圖上,SRMT與最近的MARL、混合和基於規劃的算法相媲美。這些結果表明,在基於轉換器的架構中融入共享遞歸記憶可以增強分散式多智能體系統中的協調。訓練和評估的源代碼可在 GitHub 上找到:https://github.com/Aloriosa/srmt。
English
Multi-agent reinforcement learning (MARL) demonstrates significant progress
in solving cooperative and competitive multi-agent problems in various
environments. One of the principal challenges in MARL is the need for explicit
prediction of the agents' behavior to achieve cooperation. To resolve this
issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends
memory transformers to multi-agent settings by pooling and globally
broadcasting individual working memories, enabling agents to exchange
information implicitly and coordinate their actions. We evaluate SRMT on the
Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck
navigation task that requires agents to pass through a narrow corridor and on a
POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently
outperforms a variety of reinforcement learning baselines, especially under
sparse rewards, and generalizes effectively to longer corridors than those seen
during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is
competitive with recent MARL, hybrid, and planning-based algorithms. These
results suggest that incorporating shared recurrent memory into the
transformer-based architectures can enhance coordination in decentralized
multi-agent systems. The source code for training and evaluation is available
on GitHub: https://github.com/Aloriosa/srmt.