FocusAgent: Webエージェントの大規模コンテキストを効率的に削減するシンプルかつ効果的な手法

要旨

大規模言語モデル（LLM）を基盤とするウェブエージェントは、ユーザーの目標を達成するために長大なウェブページの観測データを処理する必要がある。これらのページはしばしば数万トークンを超え、コンテキストの制限を飽和させ、計算コストを増大させる。さらに、ページ全体を処理することは、プロンプトインジェクションなどのセキュリティリスクをエージェントにさらす。既存の剪定戦略は、関連するコンテンツを破棄するか、無関係なコンテキストを保持するため、最適でないアクション予測につながる。我々は、FocusAgentを提案する。これは、タスク目標に基づいてアクセシビリティツリー（AxTree）の観測データから最も関連性の高い行を抽出するために、軽量なLLMリトリーバーを活用するシンプルかつ効果的なアプローチである。ノイズや無関係なコンテンツを剪定することで、FocusAgentは効率的な推論を可能にし、インジェクション攻撃に対する脆弱性を低減する。WorkArenaおよびWebArenaベンチマークでの実験では、FocusAgentが強力なベースラインと同等の性能を発揮しつつ、観測サイズを50%以上削減することを示した。さらに、FocusAgentのバリエーションは、バナーやポップアップ攻撃を含むプロンプトインジェクション攻撃の成功率を大幅に低減し、攻撃のない設定でのタスク成功性能を維持した。我々の結果は、ターゲットを絞ったLLMベースのリトリーバルが、効率的で効果的かつ安全なウェブエージェントを構築するための実用的で堅牢な戦略であることを強調している。

English

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost processing; moreover, processing full pages exposes agents to security risks such as prompt injection. Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction. We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations, guided by task goals. By pruning noisy and irrelevant content, FocusAgent enables efficient reasoning while reducing vulnerability to injection attacks. Experiments on WorkArena and WebArena benchmarks show that FocusAgent matches the performance of strong baselines, while reducing observation size by over 50%. Furthermore, a variant of FocusAgent significantly reduces the success rate of prompt-injection attacks, including banner and pop-up attacks, while maintaining task success performance in attack-free settings. Our results highlight that targeted LLM-based retrieval is a practical and robust strategy for building web agents that are efficient, effective, and secure.

FocusAgent: Webエージェントの大規模コンテキストを効率的に削減するシンプルかつ効果的な手法

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

要旨

Support