堆疊式LLM策略用於Web操作

摘要

在網路上執行任務對大型語言模型（LLMs）構成基本挑戰，包括組合龐大的開放世界任務和網頁界面間的變化。僅僅指定一個龐大提示以處理所有可能的行為和狀態是極其複雜的，並導致不相關行為之間的行為泄漏。將任務分解為不同策略可以應對這一挑戰，但需要仔細地在策略之間移交控制。我們提出了用於網路操作的堆疊式LLM策略（SteP），這是一種動態組合策略以解決各種網路任務。SteP定義了一個馬可夫決策過程，其中狀態是代表控制狀態的策略堆疊，即策略調用鏈。與傳統方法僅限於靜態層次結構不同，SteP實現了根據任務複雜性調整的動態控制。我們對SteP進行了多個基準測試和網路環境的評估，包括WebArena、MiniWoB++和一個CRM。在WebArena上，SteP相對於使用GPT-4策略的SOTA有所提升（14.9\%至33.5\%），而在MiniWob++上，SteP與先前的作品相媲美，同時使用的數據量明顯較少。我們的程式碼和數據可在https://asappresearch.github.io/webagents-step找到。

English

Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.

堆疊式LLM策略用於Web操作

SteP: Stacked LLM Policies for Web Actions

摘要

Support