요약: 귀하의 에이전트, 그들의 자산: OpenClaw의 실제 세계 안전성 분석

초록

2026년 초 기준 가장 광범위하게 배포된 개인 AI 에이전트인 OpenClaw는 로컬 시스템에 대한 완전한 접근 권한으로 운영되며 Gmail, Stripe, 파일 시스템과 같은 민감한 서비스와 통합됩니다. 이러한 광범위한 권한은 높은 수준의 자동화와 강력한 개인화를 가능하게 하지만, 기존의 샌드박스 평가로는 포착하지 못하는 상당한 공격 표면을 노출시키기도 합니다. 이러한 격차를 해결하기 위해 본 논문은 OpenClaw에 대한 최초의 실제 안전성 평가를 제시하고, 안전성 분석을 위해 에이전트의 지속적 상태를 Capability(능력), Identity(신원), Knowledge(지식)라는 세 가지 차원으로 통합하는 CIK 분류 체계를 소개합니다. 저희의 평가는 4가지 백본 모델(Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, GPT-5.4)을 기반으로 하는 실시간 OpenClaw 인스턴스에 대해 12가지 공격 시나리오를 다룹니다. 결과에 따르면 단일 CIK 차원을 공격하는 것만으로도 평균 공격 성공률이 24.6%에서 64-74%로 증가하며, 가장 강력한 모델에서도 기준 취약점 대비 3배 이상 증가한 것으로 나타났습니다. 또한 파일 보호 메커니즘과 함께 세 가지 CIK 기반 방어 전략을 평가했으나, 가장 강력한 방어 조치도 Capability 대상 공격 하에서 63.8%의 성공률을 보였습니다. 파일 보호는 악성 주입의 97%를 차단했지만 정당한 업데이트도 차단하는 문제가 있었습니다. 종합하면, 이러한 취약점들은 에이전트 아키텍처 자체에 내재되어 있어 개인 AI 에이전트의 보안을 위해 보다 체계적인 안전 장치가 필요함을 보여줍니다. 본 프로젝트 페이지는 https://ucsc-vlaa.github.io/CIK-Bench 입니다.

English

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

요약: 귀하의 에이전트, 그들의 자산: OpenClaw의 실제 세계 안전성 분석

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

초록

Support