代理數據協議:統一數據集以實現多樣化、高效的LLM代理微調
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
October 28, 2025
作者: Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig
cs.AI
摘要
關於大規模監督式微調人工智慧代理的公開研究成果目前仍相對稀少,主要原因在於代理訓練資料的收集存在獨特挑戰。本研究主張,瓶頸並非在於底層資料來源的匱乏,而是大量多樣化的資料分散在異質性格式、工具與介面中。為此,我們提出代理資料協定(ADP)——一種輕量級的表示語言,可作為不同格式代理資料集與下游統一代理訓練流程之間的「中介語言」。ADP的設計具備足夠表達力,能涵蓋多種類型任務(包括API/工具使用、瀏覽、編程、軟體工程及通用代理工作流),同時保持易解析性,無需針對單一資料集進行工程化處理即可直接訓練。實驗中,我們將13個現有代理訓練資料集統一轉換為ADP格式,並把標準化後的ADP資料轉為多種代理框架可直接訓練的格式。經監督式微調後,模型在標準編程、瀏覽、工具使用及研究基準測試中,相較基礎模型平均效能提升約20%,且無需領域特定調優即達到業界頂尖或接近頂尖水準。所有程式碼與資料均已公開釋出,期望ADP能助力降低標準化、可擴展且可重現的代理訓練門檻。
English
Public research results on large-scale supervised finetuning of AI agents
remain relatively rare, since the collection of agent training data presents
unique challenges. In this work, we argue that the bottleneck is not a lack of
underlying data sources, but that a large variety of data is fragmented across
heterogeneous formats, tools, and interfaces. To this end, we introduce the
agent data protocol (ADP), a light-weight representation language that serves
as an "interlingua" between agent datasets in diverse formats and unified agent
training pipelines downstream. The design of ADP is expressive enough to
capture a large variety of tasks, including API/tool use, browsing, coding,
software engineering, and general agentic workflows, while remaining simple to
parse and train on without engineering at a per-dataset level. In experiments,
we unified a broad collection of 13 existing agent training datasets into ADP
format, and converted the standardized ADP data into training-ready formats for
multiple agent frameworks. We performed SFT on these data, and demonstrated an
average performance gain of ~20% over corresponding base models, and delivers
state-of-the-art or near-SOTA performance on standard coding, browsing, tool
use, and research benchmarks, without domain-specific tuning. All code and data
are released publicly, in the hope that ADP could help lower the barrier to
standardized, scalable, and reproducible agent training.