WebCoach: 교차 세션 메모리 지도를 통한 자가 진화 웹 에이전트

초록

최근 멀티모달 LLM 기반 에이전트는 웹 탐색 분야에서 인상적인 성능을 보여주며, 다양한 도메인에서 복잡한 브라우징 작업을 완수할 수 있게 되었습니다. 그러나 현재의 에이전트는 반복적인 오류에 취약하며 세션 간 과거 경험으로부터 학습하는 능력이 부족해 장기적 견고성과 샘플 효율성이 제한됩니다. 본 연구에서는 WebCoach를 소개합니다. 이는 모델에 구애받지 않는 자가 진화 프레임워크로, 지속적인 세션 간 메모리를 통해 웹 브라우징 에이전트의 장기 계획 성능, 성찰 능력, 재학습 없이의 지속적 학습 능력을 향상시킵니다. WebCoach는 세 가지 핵심 구성 요소로 이루어집니다: (1) 원시 탐색 로그를 간결한 요약으로 표준화하는 WebCondenser, (2) 완전한 탐색 궤적을 에피소드 경험으로 체계화하는 외부 메모리 저장소, (3) 유사성과 최신성을 기준으로 관련 경험을 검색하며 런타임 후크를 통해 에이전트에 작업별 조언을 주입할지 결정하는 Coach입니다. 이 설계는 웹 에이전트가 기본 컨텍스트 창을 넘어 장기 메모리에 접근할 수 있게 하여 복잡한 브라우징 작업에서의 견고성을 높입니다. 더불어 WebCoach는 새로운 탐색 궤적에서 지속적으로 에피소드 메모리를 구축함으로써 자가 진화를 이루어 내어 에이전트가 재학습 없이 시간이 지남에 따라 성능을 개선할 수 있도록 합니다. WebVoyager 벤치마크에서의 평가 결과, WebCoach가 세 가지 서로 다른 LLM 백본을 사용하는 브라우저 활용 에이전트의 성능을 지속적으로 향상시킴을 확인했습니다. 38B 모델 기준으로 작업 성공률을 47%에서 61%로 높이면서 평균 단계 수를 유지하거나 줄였습니다. 특히 주목할 만한 점은 WebCoach를 적용한 더 작은 기본 모델이 GPT-4o를 사용하는 동일 웹 에이전트와 비슷한 성능을 달성했다는 것입니다.

English

Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.

WebCoach: 교차 세션 메모리 지도를 통한 자가 진화 웹 에이전트

WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

초록

Support