FocusAgent: 웹 에이전트의 대규모 컨텍스트를 효율적으로 줄이는 간단하지만 효과적인 방법들

초록

대규모 언어 모델(LLM)로 구동되는 웹 에이전트는 사용자 목표를 달성하기 위해 긴 웹 페이지 관찰을 처리해야 하며, 이러한 페이지는 종종 수만 개의 토큰을 초과합니다. 이는 컨텍스트 한계를 포화시키고 계산 비용을 증가시킬 뿐만 아니라, 전체 페이지를 처리함으로써 프롬프트 주입과 같은 보안 위험에 노출됩니다. 기존의 가지치기 전략은 관련 콘텐츠를 버리거나 불필요한 컨텍스트를 유지하여 최적이 아닌 행동 예측을 초래합니다. 우리는 FocusAgent를 소개합니다. 이는 작업 목표에 따라 접근성 트리(AxTree) 관찰에서 가장 관련성 높은 줄을 추출하기 위해 경량 LLM 검색기를 활용하는 간단하지만 효과적인 접근 방식입니다. FocusAgent는 잡음과 불필요한 콘텐츠를 제거함으로써 효율적인 추론을 가능하게 하고 주입 공격에 대한 취약성을 줄입니다. WorkArena 및 WebArena 벤치마크에서의 실험은 FocusAgent가 강력한 베이스라인과 동등한 성능을 보이면서 관찰 크기를 50% 이상 줄이는 것을 보여줍니다. 또한, FocusAgent의 변형은 배너 및 팝업 공격을 포함한 프롬프트 주입 공격의 성공률을 크게 감소시키면서도 공격이 없는 환경에서의 작업 성공 성능을 유지합니다. 우리의 결과는 LLM 기반의 타겟팅된 검색이 효율적이고 효과적이며 안전한 웹 에이전트를 구축하기 위한 실용적이고 견고한 전략임을 강조합니다.

English

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost processing; moreover, processing full pages exposes agents to security risks such as prompt injection. Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction. We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations, guided by task goals. By pruning noisy and irrelevant content, FocusAgent enables efficient reasoning while reducing vulnerability to injection attacks. Experiments on WorkArena and WebArena benchmarks show that FocusAgent matches the performance of strong baselines, while reducing observation size by over 50%. Furthermore, a variant of FocusAgent significantly reduces the success rate of prompt-injection attacks, including banner and pop-up attacks, while maintaining task success performance in attack-free settings. Our results highlight that targeted LLM-based retrieval is a practical and robust strategy for building web agents that are efficient, effective, and secure.

FocusAgent: 웹 에이전트의 대규모 컨텍스트를 효율적으로 줄이는 간단하지만 효과적인 방법들

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

초록

Support