TPTU-v2:增强大型语言模型代理在现实系统中的任务规划和工具使用
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
November 19, 2023
作者: Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao
cs.AI
摘要
大型语言模型(LLMs)已经展示了在解决需要结合任务规划和使用外部工具的任务方面的熟练能力,这些任务需要结合任务规划和利用外部工具(如API)的技能。然而,现实世界中的复杂系统存在三个普遍挑战,涉及任务规划和工具使用:(1)真实系统通常具有大量的API,因此不可能将所有API的描述输入LLMs的提示中,因为令牌长度有限;(2)真实系统设计用于处理复杂任务,而基本LLMs几乎无法为这些任务规划正确的子任务顺序和API调用顺序;(3)真实系统中API之间的语义和功能类似,这给LLMs甚至人类在区分它们之间带来挑战。为应对这些挑战,本文介绍了一个全面的框架,旨在增强基于LLMs的代理在实际系统中的任务规划和工具使用(TPTU)能力。我们的框架包括三个关键组件,旨在解决这些挑战:(1)API检索器从众多可用的API中选择与用户任务最相关的API;(2)LLM微调器微调基本LLM,使微调后的LLM能够更适合任务规划和API调用;(3)演示选择器自适应地检索与难以区分的API相关的不同演示,这进一步用于上下文学习,以提高最终性能。我们使用一个真实的商业系统和一个开源的学术数据集验证了我们的方法,结果清楚展示了每个单独组件以及集成框架的有效性。
English
Large Language Models (LLMs) have demonstrated proficiency in addressing
tasks that necessitate a combination of task planning and the usage of external
tools that require a blend of task planning and the utilization of external
tools, such as APIs. However, real-world complex systems present three
prevalent challenges concerning task planning and tool usage: (1) The real
system usually has a vast array of APIs, so it is impossible to feed the
descriptions of all APIs to the prompt of LLMs as the token length is limited;
(2) the real system is designed for handling complex tasks, and the base LLMs
can hardly plan a correct sub-task order and API-calling order for such tasks;
(3) Similar semantics and functionalities among APIs in real systems create
challenges for both LLMs and even humans in distinguishing between them. In
response, this paper introduces a comprehensive framework aimed at enhancing
the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating
within real-world systems. Our framework comprises three key components
designed to address these challenges: (1) the API Retriever selects the most
pertinent APIs for the user task among the extensive array available; (2) LLM
Finetuner tunes a base LLM so that the finetuned LLM can be more capable for
task planning and API calling; (3) the Demo Selector adaptively retrieves
different demonstrations related to hard-to-distinguish APIs, which is further
used for in-context learning to boost the final performance. We validate our
methods using a real-world commercial system as well as an open-sourced
academic dataset, and the outcomes clearly showcase the efficacy of each
individual component as well as the integrated framework.