AgentSynth：面向通用计算机使用代理的可扩展任务生成系统

摘要

我們推出AgentSynth，這是一個可擴展且成本效益高的流程，用於自動合成高質量的任務和軌跡數據集，適用於通用計算機使用代理。利用信息不對稱性，AgentSynth構建了在生成時簡單但在組合成長期任務時顯著更具挑戰性的子任務，從而能夠創建超過6,000個多樣且現實的任務。我們的流程始於一個基於LLM的任務提議者，由一個角色指導，隨後是一個執行代理，完成任務並記錄軌跡。這個過程重複迭代，形成一系列子任務，然後由一個單獨的代理總結成一個可控制難度的複合任務。AgentSynth的一個關鍵優勢是其能夠通過改變子任務的數量來精確調節任務複雜性。實證評估顯示，最先進的LLM代理在難度級別1時成功率為18%，而在級別6時僅為4%，突顯了基準的難度和區分能力。此外，我們的流程實現了每軌跡平均成本僅為0.60美元，比人工註釋便宜幾個數量級。我們的代碼和數據公開在https://github.com/sunblaze-ucb/AgentSynth。

English

We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. Leveraging information asymmetry, AgentSynth constructs subtasks that are simple during generation but significantly more challenging when composed into long-horizon tasks, enabling the creation of over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based task proposer guided by a persona, followed by an execution agent that completes the task and logs the trajectory. This process is repeated iteratively to form a sequence of subtasks, which are then summarized by a separate agent into a composite task of controllable difficulty. A key strength of AgentSynth is its ability to precisely modulate task complexity by varying the number of subtasks. Empirical evaluations show that state-of-the-art LLM agents suffer a steep performance drop, from 18% success at difficulty level 1 to just 4% at level 6, highlighting the benchmark's difficulty and discriminative power. Moreover, our pipeline achieves a low average cost of \$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our code and data are publicly available at https://github.com/sunblaze-ucb/AgentSynth

AgentSynth：面向通用计算机使用代理的可扩展任务生成系统

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

摘要

Support