Basta farlo!? Gli agenti informatici mostrano un'obiettività cieca e diretta

Abstract

Gli agenti per l'uso del computer (Computer-Use Agents, CUAs) rappresentano una classe di agenti sempre più diffusa che esegue azioni sulle interfacce grafiche (GUI) per raggiungere gli obiettivi degli utenti. In questo articolo, dimostriamo che i CUAs mostrano costantemente una Tendenza Cieca verso l'Obiettivo (Blind Goal-Directedness, BGD): una propensione a perseguire obiettivi indipendentemente dalla fattibilità, sicurezza, affidabilità o contesto. Identifichiamo tre modelli prevalenti di BGD: (i) mancanza di ragionamento contestuale, (ii) assunzioni e decisioni in condizioni di ambiguità, e (iii) obiettivi contraddittori o irrealizzabili. Sviluppiamo BLIND-ACT, un benchmark di 90 attività che catturano questi tre modelli. Basato su OSWorld, BLIND-ACT fornisce ambienti realistici e utilizza giudici basati su LLM per valutare il comportamento degli agenti, raggiungendo un accordo del 93,75% con le annotazioni umane. Utilizziamo BLIND-ACT per valutare nove modelli all'avanguardia, tra cui Claude Sonnet e Opus 4, Computer-Use-Preview e GPT-5, osservando alti tassi medi di BGD (80,8%) tra di essi. Dimostriamo che la BGD espone rischi sottili che emergono anche quando gli input non sono direttamente dannosi. Sebbene interventi basati su prompt riducano i livelli di BGD, permangono rischi significativi, evidenziando la necessità di interventi più robusti durante l'addestramento o l'inferenza. L'analisi qualitativa rivela i modelli di fallimento osservati: bias dell'esecuzione prioritaria (concentrarsi su come agire piuttosto che sul se agire), disconnessione pensiero-azione (esecuzione divergente dal ragionamento) e primato della richiesta (giustificare azioni a causa della richiesta dell'utente). Identificare la BGD e introdurre BLIND-ACT getta le basi per future ricerche sullo studio e la mitigazione di questo rischio fondamentale e per garantire un impiego sicuro dei CUAs.

English

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

Basta farlo!? Gli agenti informatici mostrano un'obiettività cieca e diretta

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Abstract

Support