ChatPaper.aiChatPaper

將通訊從策略中解耦:頻寬限制下的魯棒性多智能體強化學習

Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

May 20, 2026
作者: Alexi Canesse, Benoît Goupil, Jesse Read, Sonia Vanier
cs.AI

摘要

通信在多智能體強化學習(MARL)中促進了協調,但許多現實應用(例如使用無人機群進行搜索與救援)需在嚴格的頻寬限制下運行。許多通信架構仍存在耦合瓶頸,即共享的潛在表徵同時用於策略執行與智能體間通信。因此,減少訊息大小會直接限制策略的潛在空間,常導致性能顯著下降。我們透過兩項貢獻解決此問題。首先,我們引入β——一種正規化的每個智能體頻寬預算,將稀疏性、回合數與訊息維度統整為單一可比較的約束條件。其次,我們提出SLIM——一種最小化架構,將通信路徑與策略的潛在表徵解耦,從而在受益於同步通信的同時,將頻寬影響與策略容量影響分離。我們在多個部分可觀察的MARL基準測試中評估該方法(此類場景中通信至關重要)。我們的方案在受限的通信條件下展現出最先進的性能表現、可擴展性與穩健性,且當頻寬縮減時性能僅有輕微下降。
English
Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.