LCGuard: Latenter Kommunikationswächter für sicheres KV-Sharing in Multi-Agenten-Systemen

Zusammenfassung

Große Sprachmodelle (LLM)-basierte Multi-Agenten-Systeme verlassen sich zunehmend auf Zwischenkommunikation, um komplexe Aufgaben zu koordinieren. Während die meisten bestehenden Systeme über natürliche Sprache kommunizieren, zeigt neuere Arbeiten, dass latente Kommunikation, insbesondere durch Transformer-Key-Value (KV)-Caches, die Effizienz verbessern und umfassendere aufgabenrelevante Informationen bewahren kann. Allerdings kodieren KV-Caches auch kontextuelle Eingaben, Zwischenzustände des Reasoning und agentspezifische Informationen, wodurch ein undurchsichtiger Kanal entsteht, über den sensible Inhalte zwischen Agenten weitergegeben werden können, ohne explizit textuell offengelegt zu werden. Um dieses Problem anzugehen, stellen wir \textbf{LCGuard} (Latent Communication Guard) vor, ein Framework für sichere KV-basierte latente Kommunikation in Multi-Agenten-LLM-Systemen. LCGuard behandelt gemeinsame KV-Caches als latentes Arbeitsgedächtnis und erlernt transformationen auf Repräsentationsebene, bevor Cache-Artefakte zwischen Agenten übertragen werden. Wir formalisieren das Durchsickern sensibler Informationen auf Repräsentationsebene operationell durch Rekonstruktion: Ein gemeinsames Cache-Artefakt ist unsicher, wenn ein gegnerischer Decoder agentspezifische sensible Eingaben daraus rekonstruieren kann. Dies führt zu einer gegnerischen Trainingsformulierung, bei der der Gegner lernt, sensible Eingaben zu rekonstruieren, während LCGuard Transformationen erlernt, die aufgabenrelevante Semantik bewahren und rekonstruierbare Informationen reduzieren. Empirische Evaluierungen über mehrere Modellfamilien und Multi-Agenten-Benchmarks hinweg zeigen, dass LCGuard konsequent rekonstruktionsbasiertes Durchsickern und Angriffserfolgsraten reduziert, während gleichzeitig eine wettbewerbsfähige Aufgabenleistung im Vergleich zu Standard-KV-Sharing-Baselines erhalten bleibt.

English

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.

LCGuard: Latenter Kommunikationswächter für sicheres KV-Sharing in Multi-Agenten-Systemen

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Zusammenfassung

Support