Agent Data Protocol:统一数据集以实现多样化、高效的大型语言模型智能体微调
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
October 28, 2025
作者: Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig
cs.AI
摘要
关于AI智能体大规模监督微调的公开研究成果仍相对稀缺,这主要源于智能体训练数据收集面临独特挑战。本文提出,当前瓶颈并非底层数据源匮乏,而是海量数据分散在异构的格式、工具与接口中。为此,我们推出智能体数据协议——一种轻量级表示语言,可作为不同格式智能体数据集与下游统一训练流程之间的"中介语言"。ADP的设计既能充分表达各类任务(包括API/工具调用、网页浏览、编程、软件工程及通用智能体工作流),又无需针对每个数据集进行工程化处理即可轻松解析和训练。实验中,我们将13个现有智能体训练数据集统一转换为ADP格式,并将标准化后的ADP数据适配至多个智能体框架的训练就绪格式。基于这些数据的监督微调实验表明:相比基线模型平均性能提升约20%,在编程、浏览、工具使用及研究基准测试中达到或接近最先进水平,且无需领域特定调优。所有代码与数据均已开源,期望ADP能助力降低标准化、可扩展、可复现的智能体训练门槛。
English
Public research results on large-scale supervised finetuning of AI agents
remain relatively rare, since the collection of agent training data presents
unique challenges. In this work, we argue that the bottleneck is not a lack of
underlying data sources, but that a large variety of data is fragmented across
heterogeneous formats, tools, and interfaces. To this end, we introduce the
agent data protocol (ADP), a light-weight representation language that serves
as an "interlingua" between agent datasets in diverse formats and unified agent
training pipelines downstream. The design of ADP is expressive enough to
capture a large variety of tasks, including API/tool use, browsing, coding,
software engineering, and general agentic workflows, while remaining simple to
parse and train on without engineering at a per-dataset level. In experiments,
we unified a broad collection of 13 existing agent training datasets into ADP
format, and converted the standardized ADP data into training-ready formats for
multiple agent frameworks. We performed SFT on these data, and demonstrated an
average performance gain of ~20% over corresponding base models, and delivers
state-of-the-art or near-SOTA performance on standard coding, browsing, tool
use, and research benchmarks, without domain-specific tuning. All code and data
are released publicly, in the hope that ADP could help lower the barrier to
standardized, scalable, and reproducible agent training.