SkillWeaver: 웹 에이전트는 스킬을 발견하고 연마함으로써 자기 개선이 가능하다

초록

복잡한 환경에서 생존하고 번성하기 위해 인간은 환경 탐색, 경험을 재사용 가능한 기술로 계층적으로 추상화, 그리고 지속적으로 성장하는 기술 레퍼토리를 협력적으로 구축하는 정교한 자기 개선 메커니즘을 진화시켜 왔습니다. 최근의 발전에도 불구하고, 자율 웹 에이전트는 여전히 절차적 지식 추상화, 기술 개선, 기술 구성과 같은 중요한 자기 개선 능력이 부족합니다. 본 연구에서는 SkillWeaver를 소개합니다. 이는 에이전트가 재사용 가능한 기술을 API로 자율적으로 합성하여 자기 개선을 가능하게 하는 기술 중심 프레임워크입니다. 새로운 웹사이트가 주어지면, 에이전트는 기술을 자율적으로 발견하고, 이를 실행하여 연습하며, 연습 경험을 견고한 API로 정제합니다. 반복적인 탐색을 통해 가볍고 플러그 앤 플레이 방식의 API 라이브러리가 지속적으로 확장되어 에이전트의 능력을 크게 향상시킵니다. WebArena와 실제 웹사이트에서의 실험은 SkillWeaver의 효율성을 입증하며, 각각 31.8%와 39.8%의 상대적 성공률 향상을 달성했습니다. 또한, 강력한 에이전트가 합성한 API는 이전 가능한 기술을 통해 약한 에이전트의 성능을 크게 향상시켜, WebArena에서 최대 54.3%의 개선을 보였습니다. 이러한 결과는 다양한 웹사이트 상호작용을 API로 정제하고 이를 다양한 웹 에이전트 간에 원활하게 공유하는 것이 효과적임을 보여줍니다.

English

To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.