信頼性のあるGUIエージェントに向けて：サーベイ

要旨

大規模基盤モデルを動力源とするGUIエージェントは、デジタルインターフェースと相互作用し、ウェブ自動化、モバイルナビゲーション、ソフトウェアテストなど様々な応用を可能にします。しかし、その自律性の高まりに伴い、セキュリティ、プライバシー、安全性に関する重大な懸念が生じています。本調査では、GUIエージェントの信頼性を、セキュリティ脆弱性、動的環境における信頼性、透明性と説明可能性、倫理的配慮、評価方法論という5つの重要な側面から検証します。また、敵対的攻撃への脆弱性、逐次的意思決定における連鎖的故障モード、現実的な評価ベンチマークの欠如といった主要な課題を特定します。これらの問題は、実世界での展開を妨げるだけでなく、タスク成功を超えた包括的な緩和戦略を必要としています。GUIエージェントがより広く普及するにつれ、堅牢な安全基準と責任ある開発手法の確立が不可欠です。本調査は、体系的な理解と将来の研究を通じて、信頼できるGUIエージェントを推進するための基盤を提供します。

English

GUI agents, powered by large foundation models, can interact with digital interfaces, enabling various applications in web automation, mobile navigation, and software testing. However, their increasing autonomy has raised critical concerns about their security, privacy, and safety. This survey examines the trustworthiness of GUI agents in five critical dimensions: security vulnerabilities, reliability in dynamic environments, transparency and explainability, ethical considerations, and evaluation methodologies. We also identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making, and a lack of realistic evaluation benchmarks. These issues not only hinder real-world deployment but also call for comprehensive mitigation strategies beyond task success. As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential. This survey provides a foundation for advancing trustworthy GUI agents through systematic understanding and future research.