TaskCraft：自動化生成代理任務

摘要

代理任务，即需要自主性、工具使用和适应性推理的多步骤问题解决，正日益成为推动自然语言处理（NLP）和人工智能（AI）发展的核心。然而，现有的指令数据缺乏工具交互，且当前的代理基准依赖于成本高昂的人工标注，限制了其可扩展性。我们引入了TaskCraft，一个自动化工作流程，用于生成难度可扩展、多工具且可验证的代理任务及其执行轨迹。TaskCraft通过基于深度和广度的扩展，将原子任务扩展为结构和层次上复杂的挑战。实证结果表明，这些任务在生成工作流程中优化了提示，并增强了代理基础模型的监督微调。我们提供了一个包含约36,000个不同难度任务的大规模合成数据集，以支持未来关于代理调优和评估的研究。

English

Agentic tasks, which require multi-step problem solving with autonomy, tool use, and adaptive reasoning, are becoming increasingly central to the advancement of NLP and AI. However, existing instruction data lacks tool interaction, and current agentic benchmarks rely on costly human annotation, limiting their scalability. We introduce TaskCraft, an automated workflow for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories. TaskCraft expands atomic tasks using depth-based and width-based extensions to create structurally and hierarchically complex challenges. Empirical results show that these tasks improve prompt optimization in the generation workflow and enhance supervised fine-tuning of agentic foundation models. We present a large-scale synthetic dataset of approximately 36,000 tasks with varying difficulty to support future research on agent tuning and evaluation.