SteP: 웹 액션을 위한 스택형 LLM 정책

초록

웹에서 작업을 수행하는 것은 대규모 언어 모델(LLMs)에게 조합적으로 큰 오픈 월드 작업과 웹 인터페이스 간의 변동성과 같은 근본적인 도전 과제를 제시합니다. 가능한 모든 행동과 상태를 처리하기 위해 단순히 큰 프롬프트를 지정하는 것은 매우 복잡하며, 관련 없는 행동 간에 행동 누출을 초래합니다. 이를 해결하기 위해 별도의 정책으로 분해하는 방법이 있지만, 정책 간의 제어를 신중하게 넘겨주는 것이 필요합니다. 우리는 다양한 웹 작업을 해결하기 위해 정책을 동적으로 구성하는 접근 방식인 웹 액션을 위한 스택형 LLM 정책(SteP)을 제안합니다. SteP는 제어 상태, 즉 정책 호출의 체인을 나타내는 정책 스택을 상태로 하는 마르코프 결정 프로세스를 정의합니다. 정적 계층 구조에 제한되는 전통적인 방법과 달리, SteP는 작업의 복잡성에 적응하는 동적 제어를 가능하게 합니다. 우리는 SteP를 WebArena, MiniWoB++, CRM을 포함한 여러 베이스라인 및 웹 환경에서 평가합니다. WebArena에서 SteP는 GPT-4 정책을 사용하는 SOTA보다 14.9%에서 33.5% 향상되었으며, MiniWoB++에서는 상당히 적은 데이터를 사용하면서도 기존 연구와 경쟁력을 보였습니다. 우리의 코드와 데이터는 https://asappresearch.github.io/webagents-step에서 확인할 수 있습니다.

English

Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.

SteP: 웹 액션을 위한 스택형 LLM 정책

SteP: Stacked LLM Policies for Web Actions

초록

Support