邁向代理人的網際網路規模訓練
Towards Internet-Scale Training For Agents
February 10, 2025
作者: Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov
cs.AI
摘要
訓練網頁導航代理程式的主要方法是收集一組熱門網站的人類示範和手寫任務,但現在明顯地人類數據是一個效率低下的資源。我們開發了一個流程,以促進代理程式在互聯網規模上的訓練,而無需費力的人類標註。在第一階段,一個LLM生成了150k個不同網站的任務。在接下來的階段,LLM代理完成任務並生成軌跡。在最後階段,一個LLM審查這些軌跡並評判它們的成功。語言模型與人類標註者競爭,以97%的準確率檢測和過濾有害內容,以89%的速率生成可行的任務,並以82.6%的準確率評判成功的軌跡。通過擴展這個流程,基於Llama 3.1 70B的代理程式解決了150k個網站的16.7%任務。使用我們的流程生成的數據進行訓練與使用人類示範進行訓練具有競爭力。在來自Mind2Web和WebLINX的數據有限的情況下,對於在我們的流程和人類數據混合訓練的代理程式,我們將Step Accuracy提高了高達+89.5%和+122.1%。當使用來自這些基準測試的所有可用人類數據來訓練代理程式時,代理程式無法推廣到各種真實網站,而添加我們的數據可以使其推廣能力提高+149.0%(對於WebLINX)和+156.3%(對於Mind2Web)。代碼將可在以下網址獲得:data-for-agents.github.io。
English
The predominant approach for training web navigation agents gathers human
demonstrations for a set of popular websites and hand-written tasks, but it is
becoming clear that human data are an inefficient resource. We develop a
pipeline to facilitate Internet-scale training for agents without laborious
human annotations. In the first stage, an LLM generates tasks for 150k diverse
websites. In the next stage, LLM agents complete tasks and produce
trajectories. In the final stage, an LLM reviews the trajectories and judges
their success. Language models are competitive with human annotators, detecting
and filtering out harmful content with an accuracy of 97%, generating feasible
tasks with an 89% rate, and judging successful trajectories with an 82.6%
accuracy. Scaling the pipeline, agents based on Llama 3.1 70B solve 16.7% of
tasks for 150k sites. Training on the data generated by our pipeline is
competitive with training on human demonstrations. In data-limited settings
derived from Mind2Web and WebLINX, we improve Step Accuracy by up to +89.5% and
+122.1% respectively for agents trained on mixtures of data from our pipeline,
and human data. When training agents with all available human data from these
benchmarks, agents fail to generalize to diverse real sites, and adding our
data improves their generalization by +149.0% for WebLINX and +156.3% for
Mind2Web. Code will be available at: data-for-agents.github.io.Summary
AI-Generated Summary