學習局部通訊以進行大規模多智能體路徑規劃

摘要

多智能体路径规划（MAPF）是多机器人轨迹规划问题中广泛使用的抽象模型，其中多个同质智能体在共享环境中同时移动。尽管最优MAPF求解问题属于NP难问题，但可扩展且高效的求解器对物流、搜索救援等实际应用至关重要。为此，研究界提出了多种利用机器学习的去中心化次优MAPF求解方法。这些方法将MAPF（从单一智能体视角）建模为去中心化部分可观测马尔可夫决策过程（Dec-POMDP），每个时间步智能体需基于局部观测选择动作，并通常通过强化学习或模仿学习求解。我们遵循相同的方法，但额外引入了可学习的通信模块，旨在通过高效特征共享增强智能体间的协作。我们提出局部通信多智能体路径规划（LC-MAPF）——一种可泛化的预训练模型，通过相邻智能体间的多轮通信交换信息以提升协调能力。实验表明，在多样化的（未见过的）测试场景中，所提方法在多个指标上优于现有基于学习的MAPF求解器（包括模仿学习与强化学习方法）。值得注意的是，该通信机制并未牺牲LC-MAPF的可扩展性——这是基于通信的MAPF求解器常见的瓶颈问题。

English

Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.