SkillWeaver: Webエージェントはスキルを発見・磨くことで自己改善できる

要旨

複雑な環境で生き残り繁栄するために、人間は環境探索を通じて洗練された自己改善メカニズムを進化させてきました。これには、経験を再利用可能なスキルとして階層的に抽象化し、協力的に拡大し続けるスキルレパートリーを構築することが含まれます。最近の進歩にもかかわらず、自律的なウェブエージェントは依然として重要な自己改善能力を欠いており、手続き的知識の抽象化、スキルの洗練、スキルの構成に苦戦しています。本研究では、SkillWeaverを紹介します。これは、再利用可能なスキルをAPIとして自律的に合成することでエージェントが自己改善できるスキル中心のフレームワークです。新しいウェブサイトが与えられると、エージェントは自律的にスキルを発見し、それらを実行して練習し、練習経験を堅牢なAPIに蒸留します。反復的な探索により、軽量でプラグアンドプレイ可能なAPIのライブラリが継続的に拡大され、エージェントの能力が大幅に向上します。WebArenaと実世界のウェブサイトでの実験により、SkillWeaverの有効性が実証され、それぞれ31.8%と39.8%の相対的成功率向上が達成されました。さらに、強力なエージェントによって合成されたAPIは、転移可能なスキルを通じて弱いエージェントを大幅に強化し、WebArenaで最大54.3%の改善をもたらしました。これらの結果は、多様なウェブサイトインタラクションをAPIに磨き上げ、それらをさまざまなウェブエージェント間でシームレスに共有できることの有効性を示しています。

English

To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.