대규모 다중 에이전트 경로 탐색을 위한 지역적 의사소통 학습

초록

다개체 경로 탐색(Multi-Agent Pathfinding, MAPF)은 공유된 환경 내에서 다수의 동질적 에이전트가 동시에 이동하는 다중 로봇 궤적 계획 문제를 위한 널리 사용되는 추상화 기법이다. MAPF를 최적으로 해결하는 것은 NP-난해(NP-hard)임에도 불구하고, 물류 및 수색·구조와 같은 실제 응용 분야에서는 확장 가능하고 효율적인 해결사가 필수적이다. 이에 따라 연구 공동체는 기계 학습을 활용하는 다양한 분산 기반의 차선 MAPF 해결사를 제안해 왔다. 이러한 방법들은 MAPF를 단일 에이전트 관점에서 분산 부분 관찰 마르코프 결정 과정(Dec-POMDP)으로 구성하며, 각 시간 단계에서 에이전트가 지역 관찰에 기반해 행동을 결정해야 하며, 일반적으로 강화 학습 또는 모방 학습을 통해 문제를 해결한다. 본 연구는 동일한 접근법을 따르되, 효율적인 특징 공유를 통해 에이전트 간 협력을 강화하기 위해 설계된 학습 가능한 통신 모듈을 추가로 도입한다. 우리는 이웃 에이전트 간 다중 라운드 통신을 적용하여 정보를 교환하고 협력을 개선하는 일반화 가능한 사전 학습 모델인 지역 통신 기반 다개체 경로 탐색(LC-MAPF)을 제시한다. 실험 결과, 제안된 방법은 IL 및 RL 기반 접근법을 포함한 기존의 학습 기반 MAPF 해결사들을 다양한(미관측) 테스트 시나리오에서 여러 지표에 걸쳐 능가함을 보여준다. 특히, 도입된 통신 메커니즘은 통신 기반 MAPF 해결사의 일반적인 병목 현상인 LC-MAPF의 확장성을 저해하지 않는다.

English

Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.