카멜레온도 컴퓨터를 사용할 수 있다: 컴퓨터 사용 에이전트를 위한 시스템 수준 보안

초록

AI 에이전트는 악성 콘텐츠가 에이전트 동작을 탈취하여 자격 증명을 도용하거나 금전적 손실을 초래하는 프롬프트 인젝션 공격에 취약합니다. 현재 알려진 유일한 강력한 방어 방법은 신뢰할 수 있는 작업 계획과 신뢰할 수 없는 환경 관찰을 엄격히 분리하는 아키텍처적 격리입니다. 그러나 컴퓨터 사용 에이전트(CUA) — 화면을 보고 작업을 실행하여 업무를 자동화하는 시스템 —에 이 설계를 적용하는 데는 근본적인 어려움이 있습니다: 현재 에이전트는 각 작업을 결정하기 위해 UI 상태의 지속적인 관찰이 필요하지만, 이는 보안을 위해 요구되는 격리와 상충됩니다. 우리는 UI 워크플로가 동적이지만 구조적으로 예측 가능하다는 점을 입증하여 이러한 긴장을 해소합니다. 우리는 CUA를 위한 단일 샷 계획을 소개하는데, 여기서는 신뢰할 수 있는 플래너가 잠재적으로 악의적인 콘텐츠를 관찰하기 전에 조건부 분기를 포함한 완전한 실행 그래프를 생성하여 임의의 명령어 인젝션에 대해 검증 가능한 제어 흐름 무결성 보장을 제공합니다. 이러한 아키텍처적 격리는 명령어 인젝션을 성공적으로 방지하지만, UI 요소를 조작하여 계획 내에서 의도하지 않은 유효한 경로를 촉발시키는 분기 조정 공격을 방지하기 위해서는 추가 조치가 필요함을 보여줍니다. 우리는 OSWorld에서 우리의 설계를 평가했으며, 선두 모델 성능의 최대 57%를 유지하면서 더 작은 오픈소스 모델의 성능은 최대 19%까지 향상시켜, CUA에서 엄격한 보안과 유용성이 공존할 수 있음을 입증했습니다.

English

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) -- systems that automate tasks by viewing screens and executing actions -- presents a fundamental challenge: current agents require continuous observation of UI state to determine each action, conflicting with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content, providing provable control flow integrity guarantees against arbitrary instruction injections. Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks, which manipulate UI elements to trigger unintended valid paths within the plan. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs.

카멜레온도 컴퓨터를 사용할 수 있다: 컴퓨터 사용 에이전트를 위한 시스템 수준 보안

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

초록

Support