WebCoach:具备跨会话记忆引导能力的自我进化网络智能体
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance
November 17, 2025
作者: Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu
cs.AI
摘要
近日,多模态大语言模型智能体在网络浏览任务中展现出卓越能力,能够完成跨领域的复杂网页操作。然而现有智能体仍存在重复性错误问题,且缺乏跨会话经验学习能力,制约了其长期鲁棒性与样本效率。我们提出WebCoach——一种与模型无关的自进化框架,通过赋予网页浏览智能体持续性的跨会话记忆能力,在不重新训练的前提下实现长期规划、反思与持续学习。该框架包含三大核心组件:(1)WebCondenser模块,将原始浏览日志标准化为精简摘要;(2)外部记忆存储库,将完整操作轨迹组织为情景化经验;(3)教练模块,基于相似度与时效性检索相关经验,并通过运行时钩子决定是否向智能体注入任务建议。该设计使网页智能体能够突破原生上下文窗口限制,访问长期记忆资源,从而提升复杂浏览任务的稳定性。此外,WebCoach通过持续整理新导航轨迹中的情景记忆实现自我进化,使智能体无需重训练即可持续优化。在WebVoyager基准测试中,WebCoach在三种不同大语言模型基座上均显著提升浏览智能体性能:使用38B参数模型时,任务成功率从47%提升至61%,同时保持或减少平均操作步数。值得注意的是,搭载WebCoach的较小基座模型可实现与使用GPT-4o的同类网页智能体相媲美的性能表现。
English
Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.