SRMT：多智能體終身路徑規劃的共享記憶

摘要

多智能體強化學習（MARL）在各種環境中解決合作和競爭性多智能體問題方面取得了顯著進展。MARL面臨的主要挑戰之一是需要明確預測智能體的行為以實現合作。為了解決這個問題，我們提出了共享遞歸記憶轉換器（SRMT），它將記憶轉換器擴展到多智能體設置中，通過汇集和全局廣播個別的工作記憶，使智能體能夠隱式交換信息並協調其行動。我們在部分可觀察多智能體路徑規劃問題中對SRMT進行評估，在這個玩具 Eng Bottleneck 導航任務中，智能體需要通過狹窄走廊，以及在 POGEMA 基準任務集上進行評估。在 Bottleneck 任務中，SRMT在各種強化學習基準中表現出色，特別是在稀疏獎勵下，並且能夠有效地推廣到比訓練中見過的更長走廊。在包括迷宮、隨機和 MovingAI 在內的 POGEMA 地圖上，SRMT與最近的MARL、混合和基於規劃的算法相媲美。這些結果表明，在基於轉換器的架構中融入共享遞歸記憶可以增強分散式多智能體系統中的協調。訓練和評估的源代碼可在 GitHub 上找到：https://github.com/Aloriosa/srmt。

English

Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.

SRMT：多智能體終身路徑規劃的共享記憶

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

摘要

Support