解耦通信与策略：带宽约束下的鲁棒多智能体强化学习

摘要

通信在多智能体强化学习（MARL）中实现了协调，但许多实际应用（例如使用无人机蜂群进行搜救）在严格的带宽约束下运行。许多通信架构仍然存在耦合瓶颈——共享的潜在表示同时用于策略执行和智能体间通信。因此，缩减消息规模会直接限制策略的潜在空间，通常导致性能显著下降。我们通过两项贡献解决这一问题。首先，我们引入β，即归一化的每个智能体带宽预算，将稀疏性、轮次和消息维度统一为单个可比较的约束。其次，我们提出SLIM，一种最小化架构，将通信路径与策略的潜在表示解耦，从而在享有同步通信优势的同时，隔离带宽对策略容量的影响。我们在多个部分可观测的MARL基准测试中评估了该方法，这些测试中通信至关重要。我们的方法在有限通信条件下实现了最先进的性能，并展现出可扩展性和鲁棒性，随着带宽降低，性能仅出现轻微下降。

English

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.