只管去做?计算机使用代理展现盲目目标导向性
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
October 2, 2025
作者: Erfan Shayegani, Keegan Hines, Yue Dong, Nael Abu-Ghazaleh, Roman Lutz, Spencer Whitehead, Vidhisha Balachandran, Besmira Nushi, Vibhav Vineet
cs.AI
摘要
计算机使用代理(CUAs)是一类日益普及的代理,它们通过图形用户界面(GUI)执行操作以实现用户目标。本文揭示,CUAs普遍表现出“盲目目标导向性”(BGD):一种不顾可行性、安全性、可靠性或上下文环境,执着追求目标的倾向。我们归纳了BGD的三种常见模式:(i) 缺乏上下文推理,(ii) 在模糊性下的假设与决策,以及(iii) 矛盾或不可行的目标。为此,我们开发了BLIND-ACT基准测试,包含90项任务,涵盖上述三种模式。基于OSWorld构建的BLIND-ACT提供了逼真的环境,并采用基于大语言模型(LLM)的评判者来评估代理行为,其与人工标注的一致性达到93.75%。我们利用BLIND-ACT评估了包括Claude Sonnet、Opus 4、Computer-Use-Preview及GPT-5在内的九款前沿模型,发现它们平均BGD率高达80.8%。研究表明,BGD揭示了即便输入非直接有害时仍存在的微妙风险。尽管基于提示的干预措施降低了BGD水平,但显著风险依然存在,这凸显了加强训练或推理阶段干预的必要性。定性分析揭示了观察到的失败模式:执行优先偏差(关注如何行动而非是否应行动)、思维与行动脱节(执行偏离推理)、以及请求至上(因用户请求而合理化行动)。识别BGD并引入BLIND-ACT为未来研究这一基础风险及其缓解措施,确保CUAs的安全部署奠定了基础。
English
Computer-Use Agents (CUAs) are an increasingly deployed class of agents that
take actions on GUIs to accomplish user goals. In this paper, we show that CUAs
consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals
regardless of feasibility, safety, reliability, or context. We characterize
three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii)
assumptions and decisions under ambiguity, and (iii) contradictory or
infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these
three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and
employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement
with human annotations. We use BLIND-ACT to evaluate nine frontier models,
including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing
high average BGD rates (80.8%) across them. We show that BGD exposes subtle
risks that arise even when inputs are not directly harmful. While
prompting-based interventions lower BGD levels, substantial risk persists,
highlighting the need for stronger training- or inference-time interventions.
Qualitative analysis reveals observed failure modes: execution-first bias
(focusing on how to act over whether to act), thought-action disconnect
(execution diverging from reasoning), and request-primacy (justifying actions
due to user request). Identifying BGD and introducing BLIND-ACT establishes a
foundation for future research on studying and mitigating this fundamental risk
and ensuring safe CUA deployment.