OpenThoughts-Agent：智能体模型的数据配方

摘要

智能体语言模型极大地拓展了人工智能的应用场景，但关于如何为通用型智能体筛选训练数据，目前可公开获取的知识仍然十分有限。现有开源项目如SWE-Smith、SERA和Nemotron-Terminal通常仅针对单一基准测试，未能解决如何训练模型以泛化到多种智能体任务的问题。OpenThoughts-Agent（OT-Agent）项目通过构建完全开源的数据整理流程填补了这一空白。我们开展了超过100项受控消融实验，系统探究流程的每个阶段，揭示了任务来源与多样性的重要性。随后，我们利用该流程整理了包含10万个示例的训练集，对Qwen3-32B模型进行微调，在七个智能体基准测试中实现了平均44.8%的准确率，相比现有最强的开源数据智能体模型（Nemotron-Terminal-32B，40.9%）提升了3.9个百分点。此外，我们的训练数据展现出强大的扩展特性，在计算资源受控的对比实验中，每种训练数据规模下的表现均优于其他开源数据集。我们已在openthoughts.ai平台公开了训练集、数据流程、实验数据及模型，以支持未来关于智能体模型训练的开源研究。

English

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct more than 100 controlled ablation experiments to systematically investigate each stage of the pipeline, yielding insights on the importance of task sources and diversity. We then assemble a training set of 100K examples from our pipeline and fine-tune Qwen3-32B on this dataset, which yields an average accuracy of 44.8% across seven agentic benchmarks and a 3.9 percentage point improvement over the strongest existing open data agentic model (Nemotron-Terminal-32B, 40.9%). Moreover, our training data exhibits strong scaling properties, outperforming alternative open datasets at every training set size in compute-controlled comparisons. We publicly release our training sets, data pipeline, experimental data, and models at openthoughts.ai to support future open research on agentic model training.