AgentSynth:面向通用计算机使用代理的可扩展任务生成系统
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
June 17, 2025
作者: Jingxu Xie, Dylan Xu, Xuandong Zhao, Dawn Song
cs.AI
摘要
我們推出AgentSynth,這是一個可擴展且成本效益高的流程,用於自動合成高質量的任務和軌跡數據集,適用於通用計算機使用代理。利用信息不對稱性,AgentSynth構建了在生成時簡單但在組合成長期任務時顯著更具挑戰性的子任務,從而能夠創建超過6,000個多樣且現實的任務。我們的流程始於一個基於LLM的任務提議者,由一個角色指導,隨後是一個執行代理,完成任務並記錄軌跡。這個過程重複迭代,形成一系列子任務,然後由一個單獨的代理總結成一個可控制難度的複合任務。AgentSynth的一個關鍵優勢是其能夠通過改變子任務的數量來精確調節任務複雜性。實證評估顯示,最先進的LLM代理在難度級別1時成功率為18%,而在級別6時僅為4%,突顯了基準的難度和區分能力。此外,我們的流程實現了每軌跡平均成本僅為0.60美元,比人工註釋便宜幾個數量級。我們的代碼和數據公開在https://github.com/sunblaze-ucb/AgentSynth。
English
We introduce AgentSynth, a scalable and cost-efficient pipeline for
automatically synthesizing high-quality tasks and trajectory datasets for
generalist computer-use agents. Leveraging information asymmetry, AgentSynth
constructs subtasks that are simple during generation but significantly more
challenging when composed into long-horizon tasks, enabling the creation of
over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based
task proposer guided by a persona, followed by an execution agent that
completes the task and logs the trajectory. This process is repeated
iteratively to form a sequence of subtasks, which are then summarized by a
separate agent into a composite task of controllable difficulty. A key strength
of AgentSynth is its ability to precisely modulate task complexity by varying
the number of subtasks. Empirical evaluations show that state-of-the-art LLM
agents suffer a steep performance drop, from 18% success at difficulty level 1
to just 4% at level 6, highlighting the benchmark's difficulty and
discriminative power. Moreover, our pipeline achieves a low average cost of
\$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our
code and data are publicly available at
https://github.com/sunblaze-ucb/AgentSynth