WebCoach：具备跨会话记忆引导的自演进网络智能体

摘要

近日，多模态大语言模型驱动的智能体在网络导航领域展现出卓越能力，能够完成跨领域的复杂浏览任务。然而，现有智能体仍会反复出现相同错误，且缺乏跨会话经验学习能力，限制了其长期鲁棒性与样本效率。我们提出WebCoach——一种与模型无关的自进化框架，通过赋予网页浏览智能体持久性跨会话记忆，无需重新训练即可实现长期规划、反思与持续学习能力的提升。该框架包含三大核心组件：（1）WebCondenser将原始导航日志标准化为精简摘要；（2）外部记忆库将完整操作轨迹组织为情景化经验；（3）教练模块基于相似度与时效性检索相关经验，并通过运行时钩子决策是否向智能体注入任务建议。该设计使网页智能体能够突破原生上下文窗口限制访问长期记忆，显著提升复杂浏览任务的稳定性。此外，WebCoach通过持续从新导航轨迹中提炼情景记忆实现自我进化，使智能体无需重新训练即可持续优化。在WebVoyager基准测试中，WebCoach使三种不同大语言模型基座的浏览器使用智能体性能均获提升：搭载38B参数模型时，任务成功率从47%提升至61%，同时保持或减少了平均操作步数。值得注意的是，搭载WebCoach的较小基座模型可实现与使用GPT-4o的同类网页智能体相媲美的性能表现。

English

Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.

WebCoach：具备跨会话记忆引导的自演进网络智能体

WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

摘要

Support