邁向代理人的網際網路規模訓練

摘要

訓練網頁導航代理程式的主要方法是收集一組熱門網站的人類示範和手寫任務，但現在明顯地人類數據是一個效率低下的資源。我們開發了一個流程，以促進代理程式在互聯網規模上的訓練，而無需費力的人類標註。在第一階段，一個LLM生成了150k個不同網站的任務。在接下來的階段，LLM代理完成任務並生成軌跡。在最後階段，一個LLM審查這些軌跡並評判它們的成功。語言模型與人類標註者競爭，以97%的準確率檢測和過濾有害內容，以89%的速率生成可行的任務，並以82.6%的準確率評判成功的軌跡。通過擴展這個流程，基於Llama 3.1 70B的代理程式解決了150k個網站的16.7%任務。使用我們的流程生成的數據進行訓練與使用人類示範進行訓練具有競爭力。在來自Mind2Web和WebLINX的數據有限的情況下，對於在我們的流程和人類數據混合訓練的代理程式，我們將Step Accuracy提高了高達+89.5%和+122.1%。當使用來自這些基準測試的所有可用人類數據來訓練代理程式時，代理程式無法推廣到各種真實網站，而添加我們的數據可以使其推廣能力提高+149.0%（對於WebLINX）和+156.3%（對於Mind2Web）。代碼將可在以下網址獲得：data-for-agents.github.io。

English

The predominant approach for training web navigation agents gathers human demonstrations for a set of popular websites and hand-written tasks, but it is becoming clear that human data are an inefficient resource. We develop a pipeline to facilitate Internet-scale training for agents without laborious human annotations. In the first stage, an LLM generates tasks for 150k diverse websites. In the next stage, LLM agents complete tasks and produce trajectories. In the final stage, an LLM reviews the trajectories and judges their success. Language models are competitive with human annotators, detecting and filtering out harmful content with an accuracy of 97%, generating feasible tasks with an 89% rate, and judging successful trajectories with an 82.6% accuracy. Scaling the pipeline, agents based on Llama 3.1 70B solve 16.7% of tasks for 150k sites. Training on the data generated by our pipeline is competitive with training on human demonstrations. In data-limited settings derived from Mind2Web and WebLINX, we improve Step Accuracy by up to +89.5% and +122.1% respectively for agents trained on mixtures of data from our pipeline, and human data. When training agents with all available human data from these benchmarks, agents fail to generalize to diverse real sites, and adding our data improves their generalization by +149.0% for WebLINX and +156.3% for Mind2Web. Code will be available at: data-for-agents.github.io.

邁向代理人的網際網路規模訓練

Towards Internet-Scale Training For Agents

摘要

Support