LCGuard: マルチエージェントシステムにおける安全なKV共有のための潜在通信ガード

要旨

大規模言語モデル（LLM）に基づくマルチエージェントシステムは、複雑なタスクを調整するために中間通信への依存を強めている。既存のシステムの大半は自然言語を介して通信を行うが、最近の研究では、特にトランスフォーマーのキー・バリュー（KV）キャッシュを介した潜在通信が、効率を向上させ、より豊富なタスク関連情報を保持できることが示されている。しかし、KVキャッシュはコンテキスト入力、中間推論状態、エージェント固有の情報も符号化するため、機密コンテンツが明示的なテキスト開示なしにエージェント間で伝播し得る不透明なチャネルを形成する。この問題に対処するため、本稿ではマルチエージェントLLMシステムにおける安全なKVベースの潜在通信のためのフレームワークである**LCGuard**（Latent Communication Guard）を提案する。LCGuardは共有されたKVキャッシュを潜在的な作業記憶として扱い、キャッシュ成果物がエージェント間で送信される前に表現レベルの変換を学習する。表現レベルでの機密情報漏洩を操作的に定式化するために、再構成に基づく手法を採用する。すなわち、共有キャッシュ成果物が、敵対的デコーダがそこからエージェント固有の機密入力を復元できる場合に安全でないと定義する。これにより、敵対者（adversary）が機密入力の再構成を学習する一方、LCGuardはタスク関連の意味情報を保持しつつ再構成可能な情報を低減する変換を学習する、という敵対的学習の定式化が導かれる。複数のモデルファミリーとマルチエージェントベンチマークを用いた実証評価により、LCGuardは標準的なKV共有ベースラインと比較して、競争力のあるタスク性能を維持しながら、再構成ベースの漏洩と攻撃成功率を一貫して低減することが示された。

English

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.

LCGuard: マルチエージェントシステムにおける安全なKV共有のための潜在通信ガード

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

要旨

Support