NesTools:用於評估大型語言模型嵌套工具學習能力的數據集
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
October 15, 2024
作者: Han Han, Tong Zhu, Xiang Zhang, Mengsong Wu, Hao Xiong, Wenliang Chen
cs.AI
摘要
大型語言模型(LLMs)結合工具學習在實際應用中取得了令人印象深刻的成果。在工具學習過程中,LLMs 可能以巢狀順序調用多個工具,後者的調用可能將前者的回應作為其輸入參數。然而,目前對於巢狀工具學習能力的研究仍未得到充分探討,因為現有的基準測試缺乏相關的數據實例。為解決這一問題,我們引入 NesTools 來填補目前在全面巢狀工具學習評估方面的空白。NesTools 包括一種新穎的自動數據生成方法,用於構建具有不同巢狀結構的大規模巢狀工具調用。通過手動審查和細化,該數據集具有高質量並與現實場景密切相關。因此,NesTools 可作為評估LLMs的巢狀工具學習能力的新基準。我們對22個LLMs進行了大量實驗,並使用NesTools進行了深入分析,結果顯示目前的LLMs仍然面臨著複雜的巢狀工具學習任務。
English
Large language models (LLMs) combined with tool learning have gained
impressive results in real-world applications. During tool learning, LLMs may
call multiple tools in nested orders, where the latter tool call may take the
former response as its input parameters. However, current research on the
nested tool learning capabilities is still under-explored, since the existing
benchmarks lack of relevant data instances. To address this problem, we
introduce NesTools to bridge the current gap in comprehensive nested tool
learning evaluations. NesTools comprises a novel automatic data generation
method to construct large-scale nested tool calls with different nesting
structures. With manual review and refinement, the dataset is in high quality
and closely aligned with real-world scenarios. Therefore, NesTools can serve as
a new benchmark to evaluate the nested tool learning abilities of LLMs. We
conduct extensive experiments on 22 LLMs, and provide in-depth analyses with
NesTools, which shows that current LLMs still suffer from the complex nested
tool learning task.Summary
AI-Generated Summary