通信とポリシーの分離：帯域幅制約下でのロバストなMARL

要旨

通信はマルチエージェント強化学習（MARL）における協調を可能にするが、ドローンスウォームによる捜索救助など多くの実世界の応用では、厳しい帯域制約の下で動作する。多くの通信アーキテクチャでは、依然としてポリシーの実行とエージェント間通信の両方に共有の潜在表現が用いられる結合されたボトルネックが存在する。その結果、メッセージサイズの削減はポリシーの潜在空間を直接制限し、しばしば顕著な性能劣化を引き起こす。本研究では、この問題に対して二つの貢献を行う。第一に、エージェントごとの正規化された帯域幅予算βを導入し、スパース性、ラウンド数、メッセージ次元を単一の比較可能な制約に統合する。第二に、通信経路をポリシーの潜在表現から分離する最小限のアーキテクチャSLIMを提供し、同期的通信の利点を活かしながら、帯域幅の影響をポリシー容量の影響から分離することを可能にする。本手法を、通信が不可欠な複数の部分観測可能なMARLベンチマークで評価する。提案手法は最先端の性能を達成し、帯域幅を削減してもわずかな劣化しか生じず、限られた通信下でのスケーラビリティとロバスト性を示す。

English

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.