ToolChain*: 大型语言模型中高效的动作空间导航与A*搜索
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
October 20, 2023
作者: Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, Chao Zhang
cs.AI
摘要
大型语言模型(LLMs)已经展示了在解决复杂的现实世界问题中具有强大的决策和规划能力。基于LLM的自主代理可以与各种工具(例如功能API)进行交互,并生成执行一系列API函数调用的解决方案计划。候选API函数调用的众多选择显著扩展了行动空间,增加了对高效行动空间导航的关键需求。然而,现有方法要么在庞大的行动空间中难以进行单向探索,陷入局部最优解,要么遭受穷举遍历所有潜在行动的困扰,导致导航低效。为了解决这些问题,我们提出了ToolChain*,这是一种基于高效树搜索的LLM代理规划算法。它将整个行动空间构建为一个决策树,其中每个节点代表解决方案计划中涉及的可能API函数调用。通过将A*搜索算法与特定任务成本函数设计相结合,它有效地修剪可能涉及错误操作的高成本分支,识别最低成本的有效路径作为解决方案。对多个工具使用和推理任务进行的广泛实验表明,ToolChain*在庞大行动空间中有效平衡了探索和利用。在规划和推理任务上,它的表现优于现有技术基线,平均提高了3.1%和3.5%,同时分别减少了7.35倍和2.31倍的时间。
English
Large language models (LLMs) have demonstrated powerful decision-making and
planning capabilities in solving complicated real-world problems. LLM-based
autonomous agents can interact with diverse tools (e.g., functional APIs) and
generate solution plans that execute a series of API function calls in a
step-by-step manner. The multitude of candidate API function calls
significantly expands the action space, amplifying the critical need for
efficient action space navigation. However, existing methods either struggle
with unidirectional exploration in expansive action spaces, trapped into a
locally optimal solution, or suffer from exhaustively traversing all potential
actions, causing inefficient navigation. To address these issues, we propose
ToolChain*, an efficient tree search-based planning algorithm for LLM-based
agents. It formulates the entire action space as a decision tree, where each
node represents a possible API function call involved in a solution plan. By
incorporating the A* search algorithm with task-specific cost function design,
it efficiently prunes high-cost branches that may involve incorrect actions,
identifying the most low-cost valid path as the solution. Extensive experiments
on multiple tool-use and reasoning tasks demonstrate that ToolChain*
efficiently balances exploration and exploitation within an expansive action
space. It outperforms state-of-the-art baselines on planning and reasoning
tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time,
respectively.