邁向可信的圖形用戶界面代理：一項調查

摘要

基於大型基礎模型的GUI代理能夠與數字界面互動，實現了網頁自動化、移動導航及軟件測試等多種應用。然而，其日益增強的自主性引發了對其安全性、隱私保護及可靠性的重大擔憂。本調查從五個關鍵維度審視了GUI代理的可信度：安全漏洞、動態環境中的可靠性、透明度與可解釋性、倫理考量以及評估方法論。我們還識別了諸如對抗性攻擊的脆弱性、序列決策中的級聯故障模式，以及缺乏現實評估基準等主要挑戰。這些問題不僅阻礙了實際部署，還呼籲超越任務成功率的全面緩解策略。隨著GUI代理的普及，建立堅固的安全標準和負責任的開發實踐變得至關重要。本調查為通過系統性理解和未來研究推進可信GUI代理奠定了基礎。

English

GUI agents, powered by large foundation models, can interact with digital interfaces, enabling various applications in web automation, mobile navigation, and software testing. However, their increasing autonomy has raised critical concerns about their security, privacy, and safety. This survey examines the trustworthiness of GUI agents in five critical dimensions: security vulnerabilities, reliability in dynamic environments, transparency and explainability, ethical considerations, and evaluation methodologies. We also identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making, and a lack of realistic evaluation benchmarks. These issues not only hinder real-world deployment but also call for comprehensive mitigation strategies beyond task success. As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential. This survey provides a foundation for advancing trustworthy GUI agents through systematic understanding and future research.