ChatPaper.aiChatPaper

基於狀態動態檢索的網路代理線上技能學習

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

June 3, 2026
作者: Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu
cs.AI

摘要

语言代理越来越依赖可复用技能来改进跨相关任务的多步骤网络自动化。一系列新兴研究探讨在线技能学习,即代理持续从先前任务轨迹中归纳技能,并在未来任务中动态复用。然而,现有方法主要在任务层面复用技能:根据初始任务指令检索固定技能集,并在整个执行过程中保持不变。这种静态策略与网络执行的实际需求不符——合适的下一步动作不仅取决于任务目标,还取决于当前网页状态,而网页状态往往演变为初始技能无法覆盖的情况。为弥补这一不足,我们提出状态基础动态检索(SGDR),一种在线技能学习方法,使网络代理能够逐步复用技能。SGDR包含三个组成部分:一种滑动窗口提取过程,将已完成轨迹转化为可在中间执行状态调用的可复用子流程;一种双重文本-代码表示,连接技能检索与可执行动作;一种状态基础动态检索机制,将技能同时匹配至任务目标与当前网页状态。在WebArena五个领域的实验表明,SGDR始终优于强基线方法,GPT-4.1平均成功率达37.5%,Qwen3-4B达24.3%,分别比最强基线相对提升10.6%和10.0%。代码开源在https://github.com/plusnli/skill-dynamic-retrieval。
English
Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.