只管去做!?電腦使用代理展現盲目的目標導向性
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
October 2, 2025
作者: Erfan Shayegani, Keegan Hines, Yue Dong, Nael Abu-Ghazaleh, Roman Lutz, Spencer Whitehead, Vidhisha Balachandran, Besmira Nushi, Vibhav Vineet
cs.AI
摘要
電腦使用代理(CUAs)是一類日益普及的代理,它們通過圖形用戶界面(GUI)採取行動以達成用戶目標。本文揭示,CUAs普遍表現出盲目目標導向性(BGD):一種不顧可行性、安全性、可靠性或情境而追求目標的偏見。我們刻畫了BGD的三種常見模式:(i) 缺乏情境推理,(ii) 在模糊性下的假設與決策,以及(iii) 矛盾或不可行的目標。基於OSWorld,我們開發了BLIND-ACT,這是一個包含90個任務的基準測試集,旨在捕捉上述三種模式。BLIND-ACT提供了真實的環境,並採用基於大語言模型(LLM)的評判者來評估代理行為,與人工註釋達到了93.75%的一致性。我們利用BLIND-ACT評估了包括Claude Sonnet、Opus 4、Computer-Use-Preview和GPT-5在內的九個前沿模型,觀察到它們平均高達80.8%的BGD率。研究表明,BGD揭示了即便輸入並非直接有害時仍存在的微妙風險。雖然基於提示的干預措施降低了BGD水平,但重大風險依然存在,這凸顯了更強訓練或推理時干預的必要性。定性分析揭示了觀察到的失敗模式:執行優先偏見(關注如何行動而非是否應行動)、思維與行動脫節(執行偏離推理)以及請求至上(因用戶請求而為行動辯護)。識別BGD並引入BLIND-ACT,為未來研究和減輕這一根本風險、確保CUAs的安全部署奠定了基礎。
English
Computer-Use Agents (CUAs) are an increasingly deployed class of agents that
take actions on GUIs to accomplish user goals. In this paper, we show that CUAs
consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals
regardless of feasibility, safety, reliability, or context. We characterize
three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii)
assumptions and decisions under ambiguity, and (iii) contradictory or
infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these
three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and
employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement
with human annotations. We use BLIND-ACT to evaluate nine frontier models,
including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing
high average BGD rates (80.8%) across them. We show that BGD exposes subtle
risks that arise even when inputs are not directly harmful. While
prompting-based interventions lower BGD levels, substantial risk persists,
highlighting the need for stronger training- or inference-time interventions.
Qualitative analysis reveals observed failure modes: execution-first bias
(focusing on how to act over whether to act), thought-action disconnect
(execution diverging from reasoning), and request-primacy (justifying actions
due to user request). Identifying BGD and introducing BLIND-ACT establishes a
foundation for future research on studying and mitigating this fundamental risk
and ensuring safe CUA deployment.