WebCoach: クロスセッション記憶誘導による自己進化型Webエージェント

要旨

マルチモーダルLLMを搭載したエージェントは、最近、ウェブナビゲーションにおいて印象的な能力を示し、多様な領域にわたる複雑なブラウジングタスクの遂行を可能にしている。しかし、現在のエージェントは繰り返し発生するエラーに悩まされ、セッションを越えた過去の経験から学習する能力を欠いており、長期的なロバスト性とサンプル効率が制限されている。本論文では、WebCoachを提案する。これはモデルに依存しない自己進化型フレームワークであり、ウェブブラウジングエージェントに永続的なクロスセッションメモリを装備させることで、再学習なしに、改善された長期的計画、振り返り、継続的学習を可能にする。WebCoachは3つの主要コンポーネントで構成される：(1) 生のナビゲーションログを簡潔な要約に標準化するWebCondenser、(2) 完全な軌跡をエピソード的経験として整理する外部メモリストア、(3) 類似性と新しさに基づいて関連する経験を検索し、ランタイムフックを介してエージェントにタスク固有のアドバイスを注入するかどうかを判断するCoachである。この設計により、ウェブエージェントは自身の本来のコンテキストウィンドウを超えた長期的メモリにアクセス可能となり、複雑なブラウジングタスクにおけるロバスト性が向上する。さらに、WebCoachは新しいナビゲーション軌跡からエピソード記憶を継続的に構築することで自己進化を達成し、エージェントが再学習なしに時間とともに改善されることを可能にする。WebVoyagerベンチマークによる評価では、WebCoachが3つの異なるLLMバックボーンを用いたブラウザ利用エージェントの性能を一貫して向上させることが実証された。38Bモデルでは、タスク成功率を47%から61%に向上させるとともに、平均ステップ数を減少または維持した。特筆すべきは、WebCoachを組み込んだより小規模なベースモデルが、GPT-4oを使用する同じウェブエージェントと同等の性能を達成した点である。

English

Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.

WebCoach: クロスセッション記憶誘導による自己進化型Webエージェント

WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

要旨

Support