有能だが不注意：コンピュータ利用エージェントは文脈的整合性に従うのか？

要旨

コンピュータ利用エージェント（CUA）は現在、電子メール、カレンダー、ToDoリストといった個人向けアプリケーションにおいて、ユーザーに代わって動作している。このようなアプリケーション横断的なアクセスは有用である一方、これまでほとんど見過ごされてきたプライバシーリスクを生み出している。それは、エージェントがあるコンテキストで作業を行う際、そのコンテキストでは不適切な情報を別のコンテキストから引き込んでしまう可能性があるという点である。そこで我々は、このリスクを実行可能かつ決定論的にスコア付け可能なシナリオに変換する評価フレームワーク、AgentCIBenchを提案する。我々はCUAに共通する3つの障害モードに着目する。すなわち、（1）視覚的共配置：エージェントがUI上のタスク対象の隣に位置する禁止項目を取り込んでしまうケース、（2）タスク曖昧性による過剰共有：エージェントが不十分に指定されたプロンプトに対して過剰な個人情報を出力してしまうケース、（3）受信者不一致：エージェントが不適切な受信者にコンテンツを送信してしまうケースである。我々は最先端のエージェント15種類を評価した結果、驚くべき高い障害率を確認した。15件中11件のエージェントが50%以上のシナリオで情報漏洩を起こし、平均漏洩率は67.9%に達した。さらに、エージェントが環境内でエンドツーエンドに動作してタスクを完了する場合でも、同様の障害が持続して発生することが確認された。我々はAgentCIBenchを公開し、より安全なコンピュータ利用エージェントの開発を促進するとともに、文脈開示テストを導入段階の安全性チェックとして位置づける。

English

Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. This cross-application access is useful, but it also creates a privacy risk that has been largely overlooked: when an agent works in one context, it can pull in information from another that is inappropriate in that context. Hence, we introduce AgentCIBench, an evaluation harness that turns this risk into executable, deterministically scored scenarios. We target three common failure modes in CUAs: visual co-location, where the agent pulls in prohibited items that sit next to the task target in the UI; task-ambiguity overshare, where the agent dumps dense personal state in response to an under-specified prompt; and recipient misalignment, where the agent sends content to an addressee for whom it is inappropriate. We evaluate 15 frontier agents and find a surprisingly high failure rate: 11 of 15 leak on more than 50% of scenarios, with an average leakage of 67.9%, and the same failures persist when agents act end-to-end in the environment to complete the task. We release AgentCIBench to encourage the development of safer computer-use agents and position contextual disclosure testing as a pre-deployment safety check.