《你的代理,他们的资产:OpenClaw现实安全性分析》
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
April 6, 2026
作者: Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
cs.AI
摘要
2026年初部署最广泛的个人AI代理OpenClaw,在拥有完整本地系统访问权限的同时,深度集成Gmail、Stripe及文件系统等敏感服务。尽管这种宽泛的权限带来了高度自动化与强大个性化能力,但也暴露出传统沙箱评估无法捕捉的巨大攻击面。为填补这一空白,我们首次对OpenClaw进行实景安全评估,并提出CIK三维分类法——将智能体的持久状态统一划分为能力(Capability)、身份(Identity)与知识(Knowledge)三个维度进行安全分析。我们在运行中的OpenClaw实例上针对四款核心模型(Claude Sonnet 4.5/Opus 4.6/Gemini 3.1 Pro/GPT-5.4)展开12类攻击场景测试。结果显示:污染任一CIK维度可使平均攻击成功率从24.6%升至64-74%,即便最强模型的漏洞暴露程度也较基线增长逾三倍。我们进一步评估了三种CIK对齐防御策略及文件保护机制,发现最强防御在能力维度攻击下仍存在63.8%的成功率,而文件保护虽能拦截97%恶意注入,却会同时阻断合法更新。这些发现表明,此类漏洞深植于智能体架构之中,亟需建立更系统化的防护机制来保障个人AI代理安全。项目主页详见:https://ucsc-vlaa.github.io/CIK-Bench。
English
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.