Online vaardigheidsleren voor webagents via toestandsgebaseerde dynamische ophaling

Samenvatting

Taalagenten vertrouwen steeds vaker op herbruikbare vaardigheden om meerstapswebautomatisering voor gerelateerde taken te verbeteren. Een groeiend onderzoeksgebied bestudeert online vaardigheidsleren, waarbij agenten continu vaardigheden afleiden uit eerdere taaktrajecten en deze ter plekke hergebruiken in toekomstige taken. Echter, bestaande methoden hergebruiken vaardigheden hoofdzakelijk op taakniveau: een vaste set vaardigheden wordt opgehaald op basis van de initiële taakinstructie en blijft gedurende de uitvoering onveranderd. Deze statische strategie sluit niet aan bij webuitvoering, waar de juiste volgende actie niet alleen afhangt van het taakdoel, maar ook van de huidige webpagina-toestand, die vaak overgaat in situaties die de initiële vaardigheden niet dekken. Om deze leemte aan te pakken, stellen we State-Grounded Dynamic Retrieval (SGDR) voor, een online vaardigheidsleermethode die stapsgewijs vaardigheidshergebruik voor webagenten mogelijk maakt. SGDR bestaat uit drie componenten: een glijdende-vensterextractieproces dat voltooide trajecten omzet in herbruikbare subprocedures die kunnen worden aangeroepen in tussenliggende uitvoeringstoestanden, een duale tekst-code-representatie die vaardigheidsophaling koppelt aan uitvoerbare acties, en een toestandsgebonden dynamisch ophalingsmechanisme dat vaardigheden matcht met zowel het taakdoel als de huidige webpagina-toestand. Experimenten op WebArena in vijf domeinen tonen aan dat SGDR consequent sterke baselines overtreft, met gemiddelde succespercentages van 37,5% met GPT-4.1 en 24,3% met Qwen3-4B, wat overeenkomt met relatieve verbeteringen van respectievelijk 10,6% en 10,0% ten opzichte van de sterkste baseline. De code is beschikbaar op https://github.com/plusnli/skill-dynamic-retrieval.

English

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.