只管去做？计算机使用代理展现盲目目标导向性

摘要

计算机使用代理（CUAs）是一类日益普及的代理，它们通过图形用户界面（GUI）执行操作以实现用户目标。本文揭示，CUAs普遍表现出“盲目目标导向性”（BGD）：一种不顾可行性、安全性、可靠性或上下文环境，执着追求目标的倾向。我们归纳了BGD的三种常见模式：(i) 缺乏上下文推理，(ii) 在模糊性下的假设与决策，以及(iii) 矛盾或不可行的目标。为此，我们开发了BLIND-ACT基准测试，包含90项任务，涵盖上述三种模式。基于OSWorld构建的BLIND-ACT提供了逼真的环境，并采用基于大语言模型（LLM）的评判者来评估代理行为，其与人工标注的一致性达到93.75%。我们利用BLIND-ACT评估了包括Claude Sonnet、Opus 4、Computer-Use-Preview及GPT-5在内的九款前沿模型，发现它们平均BGD率高达80.8%。研究表明，BGD揭示了即便输入非直接有害时仍存在的微妙风险。尽管基于提示的干预措施降低了BGD水平，但显著风险依然存在，这凸显了加强训练或推理阶段干预的必要性。定性分析揭示了观察到的失败模式：执行优先偏差（关注如何行动而非是否应行动）、思维与行动脱节（执行偏离推理）、以及请求至上（因用户请求而合理化行动）。识别BGD并引入BLIND-ACT为未来研究这一基础风险及其缓解措施，确保CUAs的安全部署奠定了基础。

English

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

只管去做？计算机使用代理展现盲目目标导向性

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

摘要

Support