Ontkoppeling van communicatie van beleid: robuuste MARL onder bandbreedtebeperkingen

Samenvatting

Communicatie maakt coördinatie mogelijk in multi-agent reinforcement learning (MARL), maar veel praktische toepassingen, zoals search-and-rescue met dronezwermen, opereren onder strikte bandbreedtebeperkingen. Veel communicatiearchitecturen vertonen nog steeds een gekoppeld knelpunt waarbij een gedeelde latente representatie wordt gebruikt voor zowel beleidsuitvoering als inter-agent communicatie. Hierdoor beperkt het verkleinen van de berichtgrootte direct de latente ruimte van het beleid, wat vaak leidt tot aanzienlijke prestatievermindering. We pakken dit aan met twee bijdragen. Ten eerste introduceren we β, een genormaliseerd per-agent bandbreedtebudget dat sparsiteit, rondes en berichtdimensie verenigt in één vergelijkbare beperking. Ten tweede bieden we SLIM, een minimale architectuur die het communicatiepad ontkoppelt van de latente representatie van het beleid, waardoor we het effect van bandbreedte kunnen isoleren van het effect van beleidscapaciteit, terwijl we profiteren van communicatie binnen dezelfde stap. We evalueren onze methode op verschillende gedeeltelijk waarneembare MARL-benchmarks waar communicatie essentieel is. Onze aanpak behaalt state-of-the-art prestaties en vertoont schaalbaarheid en robuustheid onder beperkte communicatie, met slechts marginale degradatie naarmate de bandbreedte wordt verminderd.

English

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.