NesTools:用于评估大型语言模型嵌套工具学习能力的数据集
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
October 15, 2024
作者: Han Han, Tong Zhu, Xiang Zhang, Mengsong Wu, Hao Xiong, Wenliang Chen
cs.AI
摘要
大型语言模型(LLMs)结合工具学习在现实世界应用中取得了令人印象深刻的成果。在工具学习过程中,LLMs 可能会以嵌套顺序调用多个工具,后续工具调用可能将前者的响应作为其输入参数。然而,目前关于嵌套工具学习能力的研究仍未得到充分探讨,因为现有基准缺乏相关数据实例。为解决这一问题,我们引入了 NesTools 来填补当前全面嵌套工具学习评估的空白。NesTools 包括一种新颖的自动生成数据方法,用于构建具有不同嵌套结构的大规模嵌套工具调用。通过手动审查和完善,数据集质量高,与现实场景密切相关。因此,NesTools 可作为评估LLMs嵌套工具学习能力的新基准。我们对22个LLMs进行了广泛实验,并结合NesTools进行了深入分析,结果显示当前LLMs仍然面临复杂的嵌套工具学习任务。
English
Large language models (LLMs) combined with tool learning have gained
impressive results in real-world applications. During tool learning, LLMs may
call multiple tools in nested orders, where the latter tool call may take the
former response as its input parameters. However, current research on the
nested tool learning capabilities is still under-explored, since the existing
benchmarks lack of relevant data instances. To address this problem, we
introduce NesTools to bridge the current gap in comprehensive nested tool
learning evaluations. NesTools comprises a novel automatic data generation
method to construct large-scale nested tool calls with different nesting
structures. With manual review and refinement, the dataset is in high quality
and closely aligned with real-world scenarios. Therefore, NesTools can serve as
a new benchmark to evaluate the nested tool learning abilities of LLMs. We
conduct extensive experiments on 22 LLMs, and provide in-depth analyses with
NesTools, which shows that current LLMs still suffer from the complex nested
tool learning task.Summary
AI-Generated Summary