基于状态动态检索的网络智能体在线技能学习
Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval
June 3, 2026
作者: Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu
cs.AI
摘要
语言代理在执行多步网页自动化时,越来越依赖可复用技能来处理相关任务。现有研究逐渐关注在线技能学习,即代理持续从先前的任务轨迹中归纳技能,并在未来任务中即时复用。然而,当前方法主要在任务层面复用技能:基于初始任务指令检索一组固定技能,并在整个执行过程中保持不变。这种静态策略与网页执行存在偏差,因为合适的下一步操作不仅取决于任务目标,还取决于当前网页状态——而状态往往会在执行中发生转换,导致初始技能无法覆盖。为弥补这一不足,我们提出状态引导的动态检索(SGDR),一种在线技能学习方法,使网页代理能够逐步复用技能。SGDR包含三个组件:滑动窗口提取过程,将已完成轨迹转化为可在中间执行状态调用的可复用子程序;文本-代码双重表示,连接技能检索与可执行动作;以及状态引导的动态检索机制,使技能同时匹配任务目标和当前网页状态。在WebArena五个领域的实验表明,SGDR始终优于强基线,GPT-4.1平均成功率达37.5%,Qwen3-4B平均成功率达24.3%,分别相对最强基线提升10.6%和10.0%。代码已开源:https://github.com/plusnli/skill-dynamic-retrieval。
English
Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.