骆驼也能用电脑：计算机使用代理的系统级安全防护

摘要

AI智能体易受提示注入攻击，恶意内容可通过劫持代理行为窃取凭证或造成经济损失。目前唯一已知的有效防御方案是采用架构隔离策略，将可信任务规划与不可信环境观察严格分离。然而将该设计应用于计算机使用智能体（CUAs）——即通过观察屏幕状态并执行操作来实现任务自动化的系统——存在根本性矛盾：现有智能体需持续观察UI状态以确定每个动作，这与安全所需的隔离要求相冲突。我们通过论证UI工作流虽具动态性但结构可预测，成功化解了这一矛盾。提出面向CUAs的单次规划框架，使可信规划器在接触潜在恶意内容前即可生成包含条件分支的完整执行图谱，为任意指令注入提供可验证的控制流完整性保障。尽管架构隔离能有效防范指令注入，但我们发现仍需额外措施来防御分支导向攻击——此类攻击通过操纵UI元素触发计划内的非预期有效路径。在OSWorld环境中的评估表明，该方案在保持前沿模型57%性能的同时，可将小型开源模型性能提升达19%，证明CUAs能够实现严格安全性与实用性的统一。

English

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) -- systems that automate tasks by viewing screens and executing actions -- presents a fundamental challenge: current agents require continuous observation of UI state to determine each action, conflicting with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content, providing provable control flow integrity guarantees against arbitrary instruction injections. Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks, which manipulate UI elements to trigger unintended valid paths within the plan. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs.