状態に基づく動的検索によるWebエージェントのオンラインスキル学習

要旨

言語エージェントは、関連するタスクにわたるマルチステップなWeb自動化を改善するために、再利用可能なスキルにますます依存するようになっている。近年、オンラインスキル学習、すなわちエージェントが過去のタスク軌跡から継続的にスキルを獲得し、将来のタスクでその場で再利用する手法の研究が進んでいる。しかし、既存の手法は主にタスクレベルでスキルを再利用する。すなわち、初期のタスク命令に基づいて固定のスキルセットを取得し、実行中はそのまま保持する。この静的な戦略はWeb実行の実態と乖離している。なぜなら、適切な次のアクションはタスクの目標だけでなく、現在のWebページの状態にも依存し、その状態は初期のスキルではカバーできない状況へと頻繁に遷移するからである。この課題に対処するため、我々はState-Grounded Dynamic Retrieval (SGDR) を提案する。これはWebエージェント向けの段階的なスキル再利用を可能にするオンラインスキル学習手法である。SGDRは三つの要素から構成される。すなわち、完了した軌跡を中間実行状態で呼び出し可能な再利用可能なサブ手続きに変換するスライディングウィンドウ抽出プロセス、スキル検索と実行可能なアクションを結びつける二重テキストコード表現、そしてスキルをタスク目標と現在のWebページ状態の両方にマッチングする状態に基づく動的検索メカニズムである。五つのドメインにわたるWebArenaでの実験により、SGDRが強力なベースラインを一貫して上回り、GPT-4.1で平均成功率37.5%、Qwen3-4Bで24.3%を達成し、最強ベースラインに対してそれぞれ10.6%および10.0%の相対的な向上を示した。コードはhttps://github.com/plusnli/skill-dynamic-retrievalで公開されている。

English

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.