**《你的代理,他们的资产:OpenClaw现实世界安全性分析》**
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
April 6, 2026
作者: Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
cs.AI
摘要
2026年初部署最广泛的个人AI代理OpenClaw,在拥有完整本地系统访问权限的同时,还集成了Gmail、Stripe和文件系统等敏感服务。尽管这种宽泛的权限能实现高度自动化和强大的个性化功能,但也暴露出现有沙盒评估无法覆盖的巨大攻击面。为弥补这一空白,我们首次对OpenClaw进行现实场景安全评估,并提出CIK三维分类法——将智能体的持久状态统一划分为能力(Capability)、身份(Identity)与知识(Knowledge)三个维度进行安全分析。我们在运行中的OpenClaw实例上,针对四种骨干模型(Claude Sonnet 4.5、Opus 4.6、Gemini 3.1 Pro和GPT-5.4)开展了12种攻击场景测试。结果显示:污染任一CIK维度都会使平均攻击成功率从24.6%升至64-74%,即便防御最强的模型其漏洞暴露率也比基线值增长逾三倍。我们进一步评估了三种CIK对齐防御策略及文件保护机制,发现最强防御在针对能力的攻击下仍有63.8%的成功率,而文件保护虽能拦截97%恶意注入,却也会阻断合法更新。这些发现共同表明,此类漏洞是智能体架构固有的系统性问题,需要建立更体系化的防护机制来保障个人AI代理安全。项目页面详见:https://ucsc-vlaa.github.io/CIK-Bench。
English
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.