LoopTool：实现稳健LLM工具调用的数据训练闭环

摘要

通过外部工具增强大语言模型（LLMs）使其能够执行复杂的多步骤任务。然而，工具学习目前受限于静态合成数据流程——数据生成与模型训练被分割为两个独立且无交互的环节。这种方法既无法针对模型的具体弱点进行自适应聚焦，又放任噪声标签持续存在，从而降低训练效率。我们提出LoopTool框架，通过紧密集成数据合成与模型训练，构建了一个完全自动化、模型感知的数据进化闭环系统。该框架通过三个协同模块迭代优化数据与模型：(1) 贪婪能力探测（GCP）诊断模型已掌握与失败的能力；(2) 判断引导的标签验证（JGLV）利用开源评判模型发现并修正标注错误，逐步净化数据集；(3) 错误驱动的数据扩展（EDDE）基于已识别失败生成新的挑战性样本。这一闭环流程在低成本的开源生态中运行，摆脱了对昂贵闭源API的依赖。实验表明，采用LoopTool训练的8B模型显著超越其32B数据生成器，并在BFCL-v3和ACEBench基准测试中实现了同规模模型的最新最优性能。我们的工作证明，闭环自优化的数据流程能极大提升LLMs的工具使用能力。

English

Augmenting Large Language Models (LLMs) with external tools enables them to execute complex, multi-step tasks. However, tool learning is hampered by the static synthetic data pipelines where data generation and model training are executed as two separate, non-interactive processes. This approach fails to adaptively focus on a model's specific weaknesses and allows noisy labels to persist, degrading training efficiency. We introduce LoopTool, a fully automated, model-aware data evolution framework that closes this loop by tightly integrating data synthesis and model training. LoopTool iteratively refines both the data and the model through three synergistic modules: (1) Greedy Capability Probing (GCP) diagnoses the model's mastered and failed capabilities; (2) Judgement-Guided Label Verification (JGLV) uses an open-source judge model to find and correct annotation errors, progressively purifying the dataset; and (3) Error-Driven Data Expansion (EDDE) generates new, challenging samples based on identified failures. This closed-loop process operates within a cost-effective, open-source ecosystem, eliminating dependence on expensive closed-source APIs. Experiments show that our 8B model trained with LoopTool significantly surpasses its 32B data generator and achieves new state-of-the-art results on the BFCL-v3 and ACEBench benchmarks for its scale. Our work demonstrates that closed-loop, self-refining data pipelines can dramatically enhance the tool-use capabilities of LLMs.

LoopTool：实现稳健LLM工具调用的数据训练闭环

LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls

摘要

Support