為何網路AI代理比獨立大型語言模型更脆弱？一項安全性分析

摘要

近期，Web AI 代理在處理複雜的網頁導航任務方面展現了顯著的能力。然而，新興研究顯示，儘管這些代理與獨立的大型語言模型（LLMs）均基於相同的安全對齊模型構建，但前者表現出更高的脆弱性。這一差異尤其令人擔憂，因為相較於獨立的 LLMs，Web AI 代理具有更大的靈活性，這可能使其暴露於更廣泛的對抗性用戶輸入中。為構建一個應對這些問題的框架，本研究探討了導致 Web AI 代理脆弱性增加的潛在因素。值得注意的是，這種差異源於 Web AI 代理與獨立 LLMs 之間的多方面差異，以及複雜的信號——這些細微之處往往是簡單的評估指標（如成功率）所無法捕捉的。為應對這些挑戰，我們提出了組件層面的分析和一個更細緻、系統化的評估框架。通過這種精細化的調查，我們識別出三個加劇 Web AI 代理脆弱性的關鍵因素：(1) 將用戶目標嵌入系統提示中，(2) 多步驟動作生成，以及 (3) 觀察能力。我們的研究結果強調了在 AI 代理設計中增強安全性和魯棒性的迫切需求，並為有針對性的防禦策略提供了可操作的見解。

English

Recent advancements in Web AI agents have demonstrated remarkable capabilities in addressing complex web navigation tasks. However, emerging research shows that these agents exhibit greater vulnerability compared to standalone Large Language Models (LLMs), despite both being built upon the same safety-aligned models. This discrepancy is particularly concerning given the greater flexibility of Web AI Agent compared to standalone LLMs, which may expose them to a wider range of adversarial user inputs. To build a scaffold that addresses these concerns, this study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents. Notably, this disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals - nuances that simple evaluation metrics, such as success rate, often fail to capture. To tackle these challenges, we propose a component-level analysis and a more granular, systematic evaluation framework. Through this fine-grained investigation, we identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities. Our findings highlights the pressing need to enhance security and robustness in AI agent design and provide actionable insights for targeted defense strategies.