NesTools：用于评估大型语言模型嵌套工具学习能力的数据集

摘要

大型语言模型（LLMs）结合工具学习在现实世界应用中取得了令人印象深刻的成果。在工具学习过程中，LLMs 可能会以嵌套顺序调用多个工具，后续工具调用可能将前者的响应作为其输入参数。然而，目前关于嵌套工具学习能力的研究仍未得到充分探讨，因为现有基准缺乏相关数据实例。为解决这一问题，我们引入了 NesTools 来填补当前全面嵌套工具学习评估的空白。NesTools 包括一种新颖的自动生成数据方法，用于构建具有不同嵌套结构的大规模嵌套工具调用。通过手动审查和完善，数据集质量高，与现实场景密切相关。因此，NesTools 可作为评估LLMs嵌套工具学习能力的新基准。我们对22个LLMs进行了广泛实验，并结合NesTools进行了深入分析，结果显示当前LLMs仍然面临复杂的嵌套工具学习任务。

English

Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack of relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.

NesTools：用于评估大型语言模型嵌套工具学习能力的数据集

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

摘要

Support