FocusAgent:精简网页智能体大上下文的有效而简洁之道
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
October 3, 2025
作者: Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar, Xing Han Lù, Léo Boisvert, Massimo Caccia, Jérémy Espinas, Alexandre Aussem, Véronique Eglin, Alexandre Lacoste
cs.AI
摘要
基于大型语言模型(LLMs)的网页代理在处理用户目标时,必须解析冗长的网页观察数据;这些页面往往包含数万乃至更多的标记。这不仅会耗尽上下文限制,还增加了计算成本;此外,处理完整页面使代理面临如提示注入等安全风险。现有的剪枝策略要么舍弃了相关内容,要么保留了无关上下文,导致动作预测效果欠佳。我们提出了FocusAgent,一种简单而有效的方法,它利用轻量级LLM检索器,根据任务目标从可访问性树(AxTree)观察中提取最相关的行。通过剔除噪声和无关内容,FocusAgent在提升推理效率的同时,降低了遭受注入攻击的脆弱性。在WorkArena和WebArena基准测试中的实验表明,FocusAgent在保持强大基线性能的同时,将观察规模缩减了超过50%。此外,FocusAgent的一个变体显著降低了提示注入攻击的成功率,包括横幅和弹窗攻击,同时在无攻击环境下维持了任务完成性能。我们的研究结果表明,基于LLM的定向检索是构建高效、有效且安全的网页代理的一种实用且稳健的策略。
English
Web agents powered by large language models (LLMs) must process lengthy web
page observations to complete user goals; these pages often exceed tens of
thousands of tokens. This saturates context limits and increases computational
cost processing; moreover, processing full pages exposes agents to security
risks such as prompt injection. Existing pruning strategies either discard
relevant content or retain irrelevant context, leading to suboptimal action
prediction. We introduce FocusAgent, a simple yet effective approach that
leverages a lightweight LLM retriever to extract the most relevant lines from
accessibility tree (AxTree) observations, guided by task goals. By pruning
noisy and irrelevant content, FocusAgent enables efficient reasoning while
reducing vulnerability to injection attacks. Experiments on WorkArena and
WebArena benchmarks show that FocusAgent matches the performance of strong
baselines, while reducing observation size by over 50%. Furthermore, a variant
of FocusAgent significantly reduces the success rate of prompt-injection
attacks, including banner and pop-up attacks, while maintaining task success
performance in attack-free settings. Our results highlight that targeted
LLM-based retrieval is a practical and robust strategy for building web agents
that are efficient, effective, and secure.