有秘密吗？LLM智能体无法保密：多智能体系统中的隐私评估

摘要

LLM安全性评估主要在隔离环境中测试模型，然而已部署的AI智能体正越来越多地在持续社交环境中与其他智能体协同运作。我们引入一种Moltbook式模拟平台，该平台在模拟一个月的时间内让数千个LLM智能体跨社区互动，并利用该平台评估隐私作为下游安全性问题在不同社会压力程度下的表现。研究发现：从单轮评估转向多轮社交评估会显著放大隐私泄露（在OpenAI系列模型中，CIMemories的19.95%升至本文方法的45.30%）；信息泄露具有社会传染性，观察到同伴泄露敏感信息的智能体，其自身泄露概率高出8倍；明确的隐私指令虽能抑制但无法消除此效应，即便存在防护措施，泄露率仍高于37.8%。我们的结果表明，基于静态聊天的安全性基准系统性地低估了智能体部署中的风险，仅凭社会语境就足以引发单轮评估永远无法发现的敏感信息披露。

English

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style simulation platform where thousands of LLM agents interact across communities over a simulated month, and use it to evaluate privacy as a downstream safety concern under varying degrees of social pressure. We find that shifting from single turn to multi turn social evaluation amplifies privacy violations (CIMemories 19.95% to Ours 45.30% across OpenAI models), that leakage is socially contagious, with agents 8 times more likely to disclose sensitive information after observing a peer do so, and that explicit privacy instructions reduce but do not eliminate this effect, leaving leakage rates above 37.8% even with safeguards. Our findings suggest that static chat based safety benchmarks systematically underestimate risks in agentic deployment, and that social context alone is sufficient to elicit sensitive disclosures that single turn evaluations would never surface.