あなたのエージェント、彼らの資産：OpenClawの実世界における安全性分析

要旨

2026年初頭に最も広く展開されている個人用AIエージェントであるOpenClawは、ローカルシステムへの完全なアクセス権限で動作し、Gmail、Stripe、ファイルシステムなどの機微なサービスと統合されています。このような広範な権限は高度な自動化と強力なパーソナライゼーションを可能にする一方で、既存のサンドボックス型評価では捕捉できない大きな攻撃面を露呈しています。この課題に対処するため、我々はOpenClaw初の実環境安全性評価を実施し、安全性分析のためのCIK分類体系を提案します。CIK分類体系は、エージェントの永続的状態をCapability（能力）、Identity（識別情報）、Knowledge（知識）の3次元に統一的に整理するものです。評価では、4つの基盤モデル（Claude Sonnet 4.5、Opus 4.6、Gemini 3.1 Pro、GPT-5.4）上で動作する現行のOpenClawインスタンスに対し、12の攻撃シナリオを検証しました。結果によると、いずれか一つのCIK次元を毒損化するだけで、平均攻撃成功率が24.6%から64-74%に上昇し、最も堅牢なモデルにおいてもベースラインの脆弱性に対して3倍以上の増加を示しました。さらに、ファイル保護メカニズムと併せて3つのCIK整合型防御戦略を評価しましたが、最強の防御策でもCapabilityを標的とした攻撃下では63.8%の成功率を示し、ファイル保護は悪意ある注入の97%を阻止するものの、正当な更新も妨げることが判明しました。これらの知見は総合的に、脆弱性がエージェントアーキテクチャに内在しており、個人用AIエージェントを保護するにはより体系的なセーフガードが不可欠であることを示唆しています。プロジェクトページはhttps://ucsc-vlaa.github.io/CIK-Bench で公開されています。

English

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

あなたのエージェント、彼らの資産：OpenClawの実世界における安全性分析

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

要旨

Support