SteP: Gestapeld LLM-beleid voor webacties

Samenvatting

Het uitvoeren van taken op het web brengt fundamentele uitdagingen met zich mee voor grote taalmmodellen (LLM's), waaronder combinatorisch grote taken in een open wereld en variaties tussen webinterfaces. Het simpelweg specificeren van een grote prompt om alle mogelijke gedragingen en statussen te behandelen is extreem complex en resulteert in gedragslekken tussen ongerelateerde gedragingen. Decompositie naar afzonderlijke beleidsregels kan deze uitdaging aanpakken, maar vereist een zorgvuldige overdracht van controle tussen beleidsregels. Wij stellen Stacked LLM Policies for Web Actions (SteP) voor, een benadering om dynamisch beleidsregels samen te stellen om een diverse set van webtaken op te lossen. SteP definieert een Markov-beslissingsproces waarbij de staat een stapel van beleidsregels is die de controletoestand vertegenwoordigen, d.w.z. de keten van beleidsaanroepen. In tegenstelling tot traditionele methoden die beperkt zijn tot statische hiërarchieën, maakt SteP dynamische controle mogelijk die zich aanpast aan de complexiteit van de taak. We evalueren SteP tegen meerdere basislijnen en webomgevingen, waaronder WebArena, MiniWoB++ en een CRM. Op WebArena verbetert SteP (14,9\% tot 33,5\%) ten opzichte van state-of-the-art methoden die GPT-4-beleidsregels gebruiken, terwijl SteP op MiniWob++ concurrerend is met eerdere werken terwijl het aanzienlijk minder data gebruikt. Onze code en data zijn beschikbaar op https://asappresearch.github.io/webagents-step.

English

Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.

SteP: Gestapeld LLM-beleid voor webacties

SteP: Stacked LLM Policies for Web Actions

Samenvatting

Support