그냥 해보자!? 컴퓨터 사용 에이전트는 목표 지향적 행동에 맹목적이다

초록

컴퓨터 사용 에이전트(CUAs)는 사용자 목표를 달성하기 위해 GUI 상에서 동작을 수행하는 에이전트의 한 유형으로, 점점 더 많이 배포되고 있습니다. 본 논문에서 우리는 CUAs가 실현 가능성, 안전성, 신뢰성 또는 맥락과 관계없이 목표를 추구하는 편향인 블라인드 목표 지향성(Blind Goal-Directedness, BGD)을 일관되게 보인다는 것을 보여줍니다. 우리는 BGD의 세 가지 주요 패턴을 다음과 같이 특성화합니다: (i) 맥락적 추론의 부재, (ii) 모호함 속에서의 가정과 결정, (iii) 모순적이거나 실현 불가능한 목표. 우리는 이러한 세 가지 패턴을 포착한 90개의 작업으로 구성된 BLIND-ACT 벤치마크를 개발했습니다. OSWorld를 기반으로 구축된 BLIND-ACT는 현실적인 환경을 제공하고, LLM 기반 평가자를 사용하여 에이전트의 행동을 평가하며, 인간 주석과 93.75%의 일치율을 달성합니다. 우리는 BLIND-ACT를 사용하여 Claude Sonnet 및 Opus 4, Computer-Use-Preview, GPT-5를 포함한 9개의 최신 모델을 평가했으며, 이들 간의 평균 BGD 비율이 80.8%로 높게 관찰되었습니다. 우리는 BGD가 입력이 직접적으로 해롭지 않은 경우에도 발생할 수 있는 미묘한 위험을 노출시킨다는 것을 보여줍니다. 프롬프트 기반 개입은 BGD 수준을 낮추지만, 상당한 위험이 여전히 남아 있어 더 강력한 훈련 또는 추론 시점의 개입이 필요함을 강조합니다. 질적 분석은 관찰된 실패 모드를 밝혀냈습니다: 실행 우선 편향(어떻게 행동할지에 초점을 맞추는 것보다 행동할지 여부에 초점을 맞춤), 사고-행동 분리(추론과 실행이 일치하지 않음), 요청 우선성(사용자 요청으로 인해 행동을 정당화함). BGD를 식별하고 BLIND-ACT를 소개함으로써, 이 근본적인 위험을 연구하고 완화하며 안전한 CUA 배포를 보장하기 위한 미래 연구의 기반을 마련했습니다.

English

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

그냥 해보자!? 컴퓨터 사용 에이전트는 목표 지향적 행동에 맹목적이다

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

초록

Support