基于轨迹的Clawdbot(OpenClaw)安全审计报告
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
February 16, 2026
作者: Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang
cs.AI
摘要
Clawdbot是一款支持自托管、具备工具调用能力的个人AI智能体,其广泛的动作空间涵盖本地执行与网络介导的工作流,这种特性在模糊性和对抗性引导情境下会引发更高的安全风险。我们针对Clawdbot在六个风险维度上开展了轨迹中心化评估:测试集通过抽样并适度改编既有智能体安全基准(包括ATBench与LPS-Bench)中的场景,同时针对Clawdbot的工具接口补充了人工设计的测试案例。通过完整记录交互轨迹(消息传递、动作执行、工具调用参数/输出),我们结合自动化轨迹评估器(AgentDoG-Qwen3-4B)与人工审核进行安全评估。在34个标准测试案例中,其安全表现呈现非均衡特征:在侧重可靠性的任务中表现稳定,而多数失效案例出现在意图未明确定义、目标开放或看似无害的越狱提示场景下,此时细微的误解可能升级为高影响力的工具操作。我们通过代表性案例研究补充整体结果,归纳了这些案例的共性特征,深入分析了Clawdbot在实际应用中易触发的安全漏洞与典型失效模式。
English
Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these cases, analyzing the security vulnerabilities and typical failure modes that Clawdbot is prone to trigger in practice.