ChatPaper.aiChatPaper

ToolChain*: 大型語言模型中的高效行動空間導航與A*搜尋

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

October 20, 2023
作者: Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, Chao Zhang
cs.AI

摘要

大型語言模型(LLMs)已展示出在解決複雜的現實世界問題時具有強大的決策和規劃能力。基於LLM的自主代理可以與各種工具(例如功能API)互動,並生成執行一系列API函數調用的解決方案計劃。候選API函數調用的眾多可能性顯著擴展了行動空間,增強了對有效行動空間導航的重要需求。然而,現有方法要麼在擴展的行動空間中採取單向探索時遇到困難,陷入局部最優解,要麼遭受耗盡所有潛在行動的穿越,導致導航效率低下。為解決這些問題,我們提出了ToolChain*,這是一種針對基於LLM代理的高效樹搜索規劃算法。它將整個行動空間定義為一個決策樹,其中每個節點代表解決方案計劃中涉及的可能API函數調用。通過將A*搜索算法與任務特定成本函數設計相結合,它有效地修剪可能涉及不正確行動的高成本分支,識別最低成本有效路徑作為解決方案。對多個工具使用和推理任務進行的大量實驗表明,ToolChain*在擴展的行動空間中有效平衡了探索和開發。在規劃和推理任務上,它的表現優於現有技術基準,平均分別提高了3.1%和3.5%,同時所需時間分別減少了7.35倍和2.31倍。
English
Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A* search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.
PDF131December 15, 2024