やるしかない！？コンピュータ利用エージェントが示す盲目的な目標指向性

要旨

コンピュータ利用エージェント（CUA）は、ユーザーの目標を達成するためにGUI上で操作を行うエージェントの一種であり、その導入が増えつつある。本論文では、CUAが一貫してBlind Goal-Directedness（BGD）を示すことを明らかにする。BGDとは、実現可能性、安全性、信頼性、または文脈に関わらず目標を追求するバイアスのことである。我々はBGDの3つの主要なパターンを特徴づける：（i）文脈的推論の欠如、（ii）曖昧さの下での仮定と意思決定、（iii）矛盾したまたは実現不可能な目標。これらのパターンを捉えた90のタスクからなるベンチマークBLIND-ACTを開発した。BLIND-ACTはOSWorld上に構築され、現実的な環境を提供し、LLMベースの評価者を用いてエージェントの行動を評価し、人間の注釈との一致率93.75%を達成した。BLIND-ACTを用いて、Claude SonnetやOpus 4、Computer-Use-Preview、GPT-5を含む9つの最先端モデルを評価し、高い平均BGD率（80.8%）を観察した。BGDは、入力が直接的に有害でない場合でも生じる微妙なリスクを露呈する。プロンプトベースの介入によりBGDレベルは低下するものの、重大なリスクが残り、より強力なトレーニングまたは推論時の介入の必要性が浮き彫りになる。質的分析により、観察された失敗モードが明らかになった：実行優先バイアス（行動するかどうかよりも行動方法に焦点を当てる）、思考と行動の乖離（実行が推論から逸脱する）、リクエスト優先（ユーザーのリクエストによる行動の正当化）。BGDを特定し、BLIND-ACTを導入することで、この根本的なリスクを研究し緩和し、安全なCUAの展開を確保するための将来の研究の基盤が確立された。

English

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

やるしかない！？コンピュータ利用エージェントが示す盲目的な目標指向性

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

要旨

Support