代理人工作流記憶
Agent Workflow Memory
September 11, 2024
作者: Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
cs.AI
摘要
儘管基於語言模型的代理人在解決像是網頁導覽等現實世界任務方面具有潛力,但目前的方法仍然在具有複雜動作軌跡的長程任務上遇到困難。相較之下,人類可以靈活地通過從過去經驗中學習可重複使用的任務工作流程,並利用它們來引導未來的行動來解決複雜任務。為了建立能夠類似受益於此過程的代理人,我們引入了代理人工作流記憶(AWM),這是一種誘導常用重複例程,即工作流程,並有選擇性地提供工作流程給代理人以引導後續生成的方法。AWM靈活應用於離線和在線情境,代理人可以從事先的訓練示例中誘導工作流程,或者即時從測試查詢中誘導。我們在兩個主要的網頁導覽基準測試上進行實驗--Mind2Web和WebArena--這兩者涵蓋了來自旅行、購物、社交媒體等200多個領域的1000多個任務。AWM在Mind2Web和WebArena上將基準結果相對成功率分別提高了24.6%和51.1%,同時減少了解決WebArena任務所需的步驟數。此外,在跨任務、網站和領域評估中,線上AWM在訓練-測試任務分佈差距擴大時,穩健地推廣,超越基準8.9至14.0絕對點。
English
Despite the potential of language model-based agents to solve real-world
tasks such as web navigation, current methods still struggle with long-horizon
tasks with complex action trajectories. In contrast, humans can flexibly solve
complex tasks by learning reusable task workflows from past experiences and
using them to guide future actions. To build agents that can similarly benefit
from this process, we introduce Agent Workflow Memory (AWM), a method for
inducing commonly reused routines, i.e., workflows, and selectively providing
workflows to the agent to guide subsequent generations. AWM flexibly applies to
both offline and online scenarios, where agents induce workflows from training
examples beforehand or from test queries on the fly. We experiment on two major
web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover
1000+ tasks from 200+ domains across travel, shopping, and social media, among
others. AWM substantially improves the baseline results by 24.6% and 51.1%
relative success rate on Mind2Web and WebArena while reducing the number of
steps taken to solve WebArena tasks successfully. Furthermore, online AWM
robustly generalizes in cross-task, website, and domain evaluations, surpassing
baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps
widen.Summary
AI-Generated Summary