Kamelen Kunnen Ook Computers Gebruiken: Systeemniveau Beveiliging voor Computergebruik-Agenten

Samenvatting

AI-agenten zijn kwetsbaar voor prompt injection-aanvallen, waarbij kwaadaardige inhoud het gedrag van de agent kaapt om credentials te stelen of financiële schade te veroorzaken. De enige bekende robuuste verdediging is architecturale isolatie die vertrouwde taakplanning strikt scheidt van niet-vertrouwde omgevingsobservaties. Het toepassen van dit ontwerp op Computer Use Agents (CUA's) – systemen die taken automatiseren door schermen te bekijken en acties uit te voeren – vormt echter een fundamentele uitdaging: huidige agenten vereisen continue observatie van de UI-toestand om elke actie te bepalen, wat in conflict komt met de voor beveiliging vereiste isolatie. Wij lossen deze spanning op door aan te tonen dat UI-workflows, hoewel dynamisch, structureel voorspelbaar zijn. Wij introduceren Single-Shot Planning voor CUA's, waarbij een vertrouwde planner een volledige uitvoeringsgraaf met conditionele vertakkingen genereert vóór enige observatie van mogelijk kwaadaardige inhoud. Dit biedt aantoonbare garanties voor control flow-integriteit tegen willekeurige instructie-injecties. Hoewel deze architecturale isolatie instructie-injecties succesvol voorkomt, tonen wij aan dat aanvullende maatregelen nodig zijn om Branch Steering-aanvallen te voorkomen, waarbij UI-elementen worden gemanipuleerd om onbedoelde geldige paden binnen het plan te activeren. Wij evalueren ons ontwerp op OSWorld en behouden tot 57% van de prestaties van frontier-modellen, terwijl de prestaties van kleinere open-sourcemodellen met tot 19% verbeteren. Dit demonstreert dat rigoureuze beveiliging en functionaliteit kunnen samenwerken in CUA's.

English

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) -- systems that automate tasks by viewing screens and executing actions -- presents a fundamental challenge: current agents require continuous observation of UI state to determine each action, conflicting with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content, providing provable control flow integrity guarantees against arbitrary instruction injections. Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks, which manipulate UI elements to trigger unintended valid paths within the plan. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs.

Kamelen Kunnen Ook Computers Gebruiken: Systeemniveau Beveiliging voor Computergebruik-Agenten

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Samenvatting

Support