웹 에이전트를 위한 상태 기반 동적 검색을 통한 온라인 기술 학습

초록

언어 에이전트는 관련 작업 전반에 걸친 다단계 웹 자동화를 개선하기 위해 재사용 가능한 스킬에 점점 더 의존하고 있다. 증가하는 연구 흐름은 온라인 스킬 학습을 연구하며, 여기서 에이전트는 이전 작업 궤적으로부터 지속적으로 스킬을 유도하고 향후 작업에서 실시간으로 재사용한다. 그러나 기존 방법은 주로 작업 수준에서 스킬을 재사용한다: 초기 작업 지시에 기반하여 고정된 스킬 집합이 검색되고 실행 전체에 걸쳐 고정된 상태로 유지된다. 이러한 정적 전략은 웹 실행과 부합하지 않는데, 적절한 다음 행동은 작업 목표뿐만 아니라 현재 웹페이지 상태에도 의존하며, 이는 종종 초기 스킬이 다루지 못하는 상황으로 전환되기 때문이다. 이러한 격차를 해결하기 위해, 우리는 웹 에이전트를 위한 단계별 스킬 재사용을 가능하게 하는 온라인 스킬 학습 방법인 State-Grounded Dynamic Retrieval (SGDR)을 제안한다. SGDR은 세 가지 구성요소로 이루어져 있다: 완료된 궤적을 중간 실행 상태에서 호출 가능한 재사용 가능한 하위 절차로 변환하는 슬라이딩 윈도우 추출 과정, 스킬 검색과 실행 가능한 행동을 연결하는 이중 텍스트-코드 표현, 그리고 스킬을 작업 목표와 현재 웹페이지 상태 모두에 매칭하는 상태 기반 동적 검색 메커니즘이다. 다섯 개 도메인에 걸친 WebArena 실험 결과, SGDR이 강력한 기준선들을 일관되게 능가하여 GPT-4.1에서 평균 성공률 37.5%, Qwen3-4B에서 24.3%를 달성했으며, 이는 각각 가장 강력한 기준선 대비 10.6% 및 10.0%의 상대적 향상에 해당한다. 코드는 https://github.com/plusnli/skill-dynamic-retrieval에서 확인할 수 있다.

English

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.