只管去做！？電腦使用代理展現盲目的目標導向性

摘要

電腦使用代理（CUAs）是一類日益普及的代理，它們通過圖形用戶界面（GUI）採取行動以達成用戶目標。本文揭示，CUAs普遍表現出盲目目標導向性（BGD）：一種不顧可行性、安全性、可靠性或情境而追求目標的偏見。我們刻畫了BGD的三種常見模式：(i) 缺乏情境推理，(ii) 在模糊性下的假設與決策，以及(iii) 矛盾或不可行的目標。基於OSWorld，我們開發了BLIND-ACT，這是一個包含90個任務的基準測試集，旨在捕捉上述三種模式。BLIND-ACT提供了真實的環境，並採用基於大語言模型（LLM）的評判者來評估代理行為，與人工註釋達到了93.75%的一致性。我們利用BLIND-ACT評估了包括Claude Sonnet、Opus 4、Computer-Use-Preview和GPT-5在內的九個前沿模型，觀察到它們平均高達80.8%的BGD率。研究表明，BGD揭示了即便輸入並非直接有害時仍存在的微妙風險。雖然基於提示的干預措施降低了BGD水平，但重大風險依然存在，這凸顯了更強訓練或推理時干預的必要性。定性分析揭示了觀察到的失敗模式：執行優先偏見（關注如何行動而非是否應行動）、思維與行動脫節（執行偏離推理）以及請求至上（因用戶請求而為行動辯護）。識別BGD並引入BLIND-ACT，為未來研究和減輕這一根本風險、確保CUAs的安全部署奠定了基礎。

English

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

只管去做！？電腦使用代理展現盲目的目標導向性

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

摘要

Support