FocusAgent:簡潔而高效的網頁代理大上下文精簡方法
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
October 3, 2025
作者: Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar, Xing Han Lù, Léo Boisvert, Massimo Caccia, Jérémy Espinas, Alexandre Aussem, Véronique Eglin, Alexandre Lacoste
cs.AI
摘要
基於大型語言模型(LLMs)的網路代理在完成用戶目標時,必須處理冗長的網頁觀察數據;這些頁面通常超過數萬個標記。這不僅會飽和上下文限制,還增加了計算成本;此外,處理完整頁面會使代理面臨如提示注入等安全風險。現有的修剪策略要麼丟失相關內容,要麼保留無關上下文,導致次優的行動預測。我們提出了FocusAgent,這是一種簡單而有效的方法,利用輕量級LLM檢索器從可訪問性樹(AxTree)觀察中提取最相關的行,並以任務目標為指導。通過修剪噪聲和無關內容,FocusAgent實現了高效推理,同時降低了對注入攻擊的脆弱性。在WorkArena和WebArena基準測試中的實驗表明,FocusAgent與強基線的性能相當,同時將觀察大小減少了50%以上。此外,FocusAgent的一個變體顯著降低了提示注入攻擊的成功率,包括橫幅和彈出攻擊,同時在無攻擊環境中保持任務成功性能。我們的結果強調,基於LLM的定向檢索是一種實用且穩健的策略,用於構建高效、有效且安全的網路代理。
English
Web agents powered by large language models (LLMs) must process lengthy web
page observations to complete user goals; these pages often exceed tens of
thousands of tokens. This saturates context limits and increases computational
cost processing; moreover, processing full pages exposes agents to security
risks such as prompt injection. Existing pruning strategies either discard
relevant content or retain irrelevant context, leading to suboptimal action
prediction. We introduce FocusAgent, a simple yet effective approach that
leverages a lightweight LLM retriever to extract the most relevant lines from
accessibility tree (AxTree) observations, guided by task goals. By pruning
noisy and irrelevant content, FocusAgent enables efficient reasoning while
reducing vulnerability to injection attacks. Experiments on WorkArena and
WebArena benchmarks show that FocusAgent matches the performance of strong
baselines, while reducing observation size by over 50%. Furthermore, a variant
of FocusAgent significantly reduces the success rate of prompt-injection
attacks, including banner and pop-up attacks, while maintaining task success
performance in attack-free settings. Our results highlight that targeted
LLM-based retrieval is a practical and robust strategy for building web agents
that are efficient, effective, and secure.