통신과 정책의 분리: 대역폭 제약 하에서의 강건한 MARL

초록

통신은 다중 에이전트 강화 학습(MARL)에서 협력을 가능하게 하지만, 드론 군집을 이용한 수색 및 구조와 같은 많은 실제 응용 분야에서는 심각한 대역폭 제약 하에서 작동한다. 많은 통신 아키텍처는 여전히 공유 잠재 표현을 정책 실행과 에이전트 간 통신 모두에 사용하는 결합된 병목을 노출한다. 결과적으로 메시지 크기를 줄이는 것은 정책의 잠재 공간을 직접 제한하여 종종 상당한 성능 저하로 이어진다. 본 연구는 두 가지 기여를 통해 이 문제를 해결한다. 첫째, 희소성, 라운드, 메시지 차원을 하나의 비교 가능한 제약 조건으로 통합하는 정규화된 에이전트별 대역폭 예산인 β를 도입한다. 둘째, 통신 경로를 정책의 잠재 표현에서 분리하는 최소 아키텍처인 SLIM을 제공하여, 동시 통신의 이점을 활용하면서 정책 용량의 효과로부터 대역폭의 효과를 분리할 수 있게 한다. 본 방법을 통신이 필수적인 여러 부분 관측 가능 MARL 벤치마크에서 평가한다. 제안된 접근법은 최첨단 성능을 달성하며, 대역폭이 감소함에 따라 미미한 성능 저하만 보이며 제한된 통신 환경에서 확장성과 견고성을 나타낸다.

English

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.