TPTU-v2:提升基於大型語言模型的代理人在現實世界系統中的任務規劃和工具使用
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
November 19, 2023
作者: Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao
cs.AI
摘要
大型語言模型(LLMs)已展示出在處理需要結合任務規劃和使用外部工具的任務方面的熟練能力,這些工具需要結合任務規劃和利用外部工具(如API)的能力。然而,現實世界中的複雜系統存在三個普遍挑戰,涉及任務規劃和工具使用:(1)真實系統通常擁有大量的API,因此不可能將所有API的描述提供給LLMs的提示,因為令牌長度有限;(2)真實系統設計用於處理複雜任務,基本LLMs幾乎無法為此類任務規劃正確的子任務順序和API調用順序;(3)真實系統中API之間的類似語義和功能為LLMs甚至人類在區分它們方面帶來挑戰。為應對這些挑戰,本文介紹了一個全面框架,旨在增強基於LLMs的代理在現實世界系統中的任務規劃和工具使用(TPTU)能力。我們的框架包括三個關鍵組件,旨在應對這些挑戰:(1)API檢索器從眾多可用的API中選擇最相關的API以供用戶任務使用;(2)LLM微調器微調基本LLM,使微調後的LLM能夠更適合任務規劃和API調用;(3)演示選擇器自適應地檢索與難以區分的API相關的不同演示,進一步用於上下文學習,以提高最終性能。我們使用一個真實商業系統以及一個開源學術數據集來驗證我們的方法,結果清楚展示了每個單獨組件以及整合框架的有效性。
English
Large Language Models (LLMs) have demonstrated proficiency in addressing
tasks that necessitate a combination of task planning and the usage of external
tools that require a blend of task planning and the utilization of external
tools, such as APIs. However, real-world complex systems present three
prevalent challenges concerning task planning and tool usage: (1) The real
system usually has a vast array of APIs, so it is impossible to feed the
descriptions of all APIs to the prompt of LLMs as the token length is limited;
(2) the real system is designed for handling complex tasks, and the base LLMs
can hardly plan a correct sub-task order and API-calling order for such tasks;
(3) Similar semantics and functionalities among APIs in real systems create
challenges for both LLMs and even humans in distinguishing between them. In
response, this paper introduces a comprehensive framework aimed at enhancing
the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating
within real-world systems. Our framework comprises three key components
designed to address these challenges: (1) the API Retriever selects the most
pertinent APIs for the user task among the extensive array available; (2) LLM
Finetuner tunes a base LLM so that the finetuned LLM can be more capable for
task planning and API calling; (3) the Demo Selector adaptively retrieves
different demonstrations related to hard-to-distinguish APIs, which is further
used for in-context learning to boost the final performance. We validate our
methods using a real-world commercial system as well as an open-sourced
academic dataset, and the outcomes clearly showcase the efficacy of each
individual component as well as the integrated framework.