LoopTool:实现稳健LLM工具调用的数据训练闭环
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
November 12, 2025
作者: Kangning Zhang, Wenxiang Jiao, Kounianhua Du, Yuan Lu, Weiwen Liu, Weinan Zhang, Lei Zhang, Yong Yu
cs.AI
摘要
通过外部工具增强大语言模型(LLMs)使其能够执行复杂的多步骤任务。然而,工具学习目前受限于静态合成数据流程——数据生成与模型训练被分割为两个独立且无交互的环节。这种方法既无法针对模型的具体弱点进行自适应聚焦,又放任噪声标签持续存在,从而降低训练效率。我们提出LoopTool框架,通过紧密集成数据合成与模型训练,构建了一个完全自动化、模型感知的数据进化闭环系统。该框架通过三个协同模块迭代优化数据与模型:(1) 贪婪能力探测(GCP)诊断模型已掌握与失败的能力;(2) 判断引导的标签验证(JGLV)利用开源评判模型发现并修正标注错误,逐步净化数据集;(3) 错误驱动的数据扩展(EDDE)基于已识别失败生成新的挑战性样本。这一闭环流程在低成本的开源生态中运行,摆脱了对昂贵闭源API的依赖。实验表明,采用LoopTool训练的8B模型显著超越其32B数据生成器,并在BFCL-v3和ACEBench基准测试中实现了同规模模型的最新最优性能。我们的工作证明,闭环自优化的数据流程能极大提升LLMs的工具使用能力。
English
Augmenting Large Language Models (LLMs) with external tools enables them to execute complex, multi-step tasks. However, tool learning is hampered by the static synthetic data pipelines where data generation and model training are executed as two separate, non-interactive processes. This approach fails to adaptively focus on a model's specific weaknesses and allows noisy labels to persist, degrading training efficiency. We introduce LoopTool, a fully automated, model-aware data evolution framework that closes this loop by tightly integrating data synthesis and model training. LoopTool iteratively refines both the data and the model through three synergistic modules: (1) Greedy Capability Probing (GCP) diagnoses the model's mastered and failed capabilities; (2) Judgement-Guided Label Verification (JGLV) uses an open-source judge model to find and correct annotation errors, progressively purifying the dataset; and (3) Error-Driven Data Expansion (EDDE) generates new, challenging samples based on identified failures. This closed-loop process operates within a cost-effective, open-source ecosystem, eliminating dependence on expensive closed-source APIs. Experiments show that our 8B model trained with LoopTool significantly surpasses its 32B data generator and achieves new state-of-the-art results on the BFCL-v3 and ACEBench benchmarks for its scale. Our work demonstrates that closed-loop, self-refining data pipelines can dramatically enhance the tool-use capabilities of LLMs.