整体之漏甚於部分之和：多智能體協作中的組合隱私風險與緩解策略

摘要

随着大型语言模型（LLMs）在多智能体系统中的广泛应用，新的隐私风险逐渐显现，这些风险超越了记忆、直接推理或单轮评估的范畴。特别是，那些看似无害的响应，在跨交互组合时，可能累积性地使攻击者恢复敏感信息，这一现象我们称之为组合隐私泄露。我们首次系统性地研究了多智能体LLM系统中此类组合隐私泄露及其可能的缓解方法。首先，我们构建了一个框架，该框架模拟了辅助知识与智能体交互如何共同放大隐私风险，即便每个响应单独来看都是无害的。接着，为缓解这一问题，我们提出并评估了两种防御策略：（1）心智理论防御（ToM），即防御智能体通过预测其输出可能被攻击者利用来推断提问者的意图；（2）协作共识防御（CoDef），其中响应智能体与基于共享聚合状态投票的同伴协作，以限制敏感信息的传播。关键在于，我们在评估中平衡了暴露敏感信息的组合与产生无害推断的组合。我们的实验量化了这些防御策略在平衡隐私与效用权衡上的差异。我们发现，尽管思维链单独提供有限的防泄露保护（约39%的敏感信息阻断率），我们的ToM防御显著提高了敏感查询的阻断率（高达97%），但可能降低良性任务的成功率。CoDef实现了最佳平衡，产生了最高的平衡结果（79.8%），凸显了将明确推理与防御者协作相结合的优势。总之，我们的研究揭示了协作LLM部署中的一类新风险，并为设计针对组合性、上下文驱动的隐私泄露的防护措施提供了可操作的见解。

English

As large language models (LLMs) become integral to multi-agent systems, new privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations. In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information, a phenomenon we term compositional privacy leakage. We present the first systematic study of such compositional privacy leaks and possible mitigation methods in multi-agent LLM systems. First, we develop a framework that models how auxiliary knowledge and agent interactions jointly amplify privacy risks, even when each response is benign in isolation. Next, to mitigate this, we propose and evaluate two defense strategies: (1) Theory-of-Mind defense (ToM), where defender agents infer a questioner's intent by anticipating how their outputs may be exploited by adversaries, and (2) Collaborative Consensus Defense (CoDef), where responder agents collaborate with peers who vote based on a shared aggregated state to restrict sensitive information spread. Crucially, we balance our evaluation across compositions that expose sensitive information and compositions that yield benign inferences. Our experiments quantify how these defense strategies differ in balancing the privacy-utility trade-off. We find that while chain-of-thought alone offers limited protection to leakage (~39% sensitive blocking rate), our ToM defense substantially improves sensitive query blocking (up to 97%) but can reduce benign task success. CoDef achieves the best balance, yielding the highest Balanced Outcome (79.8%), highlighting the benefit of combining explicit reasoning with defender collaboration. Together, our results expose a new class of risks in collaborative LLM deployments and provide actionable insights for designing safeguards against compositional, context-driven privacy leakage.

整体之漏甚於部分之和：多智能體協作中的組合隱私風險與緩解策略

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration

摘要

Support