AgentSynth:面向通用计算机使用代理的可扩展任务生成系统
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
June 17, 2025
作者: Jingxu Xie, Dylan Xu, Xuandong Zhao, Dawn Song
cs.AI
摘要
我们推出AgentSynth,这是一个可扩展且成本效益高的自动化流程,用于为通用计算机使用代理合成高质量的任务和轨迹数据集。利用信息不对称性,AgentSynth构建了在生成时简单但在组合成长周期任务时显著更具挑战性的子任务,从而能够创建超过6,000个多样且真实的任务。我们的流程始于一个由角色引导的基于LLM的任务提议器,随后是一个执行代理,它完成任务并记录轨迹。这一过程反复迭代,形成一系列子任务,然后由另一个代理汇总成难度可控的复合任务。AgentSynth的一个关键优势在于其能够通过改变子任务数量精确调节任务复杂度。实证评估显示,最先进的LLM代理在难度级别1时成功率仅为18%,而在级别6时骤降至4%,凸显了该基准的难度和区分能力。此外,我们的流程实现了每轨迹平均0.60美元的低成本,远低于人工标注的费用。我们的代码和数据已在https://github.com/sunblaze-ucb/AgentSynth 公开。
English
We introduce AgentSynth, a scalable and cost-efficient pipeline for
automatically synthesizing high-quality tasks and trajectory datasets for
generalist computer-use agents. Leveraging information asymmetry, AgentSynth
constructs subtasks that are simple during generation but significantly more
challenging when composed into long-horizon tasks, enabling the creation of
over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based
task proposer guided by a persona, followed by an execution agent that
completes the task and logs the trajectory. This process is repeated
iteratively to form a sequence of subtasks, which are then summarized by a
separate agent into a composite task of controllable difficulty. A key strength
of AgentSynth is its ability to precisely modulate task complexity by varying
the number of subtasks. Empirical evaluations show that state-of-the-art LLM
agents suffer a steep performance drop, from 18% success at difficulty level 1
to just 4% at level 6, highlighting the benchmark's difficulty and
discriminative power. Moreover, our pipeline achieves a low average cost of
\$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our
code and data are publicly available at
https://github.com/sunblaze-ucb/AgentSynth