將通訊從策略中解耦：頻寬限制下的魯棒性多智能體強化學習

摘要

通信在多智能體強化學習（MARL）中促進了協調，但許多現實應用（例如使用無人機群進行搜索與救援）需在嚴格的頻寬限制下運行。許多通信架構仍存在耦合瓶頸，即共享的潛在表徵同時用於策略執行與智能體間通信。因此，減少訊息大小會直接限制策略的潛在空間，常導致性能顯著下降。我們透過兩項貢獻解決此問題。首先，我們引入β——一種正規化的每個智能體頻寬預算，將稀疏性、回合數與訊息維度統整為單一可比較的約束條件。其次，我們提出SLIM——一種最小化架構，將通信路徑與策略的潛在表徵解耦，從而在受益於同步通信的同時，將頻寬影響與策略容量影響分離。我們在多個部分可觀察的MARL基準測試中評估該方法（此類場景中通信至關重要）。我們的方案在受限的通信條件下展現出最先進的性能表現、可擴展性與穩健性，且當頻寬縮減時性能僅有輕微下降。

English

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.