ToolChain*: A* 탐색을 활용한 대규모 언어 모델의 효율적인 행동 공간 탐색

초록

대규모 언어 모델(LLMs)은 복잡한 현실 세계 문제를 해결하는 데 있어 강력한 의사결정 및 계획 능력을 보여주고 있다. LLM 기반 자율 에이전트는 다양한 도구(예: 기능적 API)와 상호작용하며 단계별로 일련의 API 함수 호출을 실행하는 솔루션 계획을 생성할 수 있다. 수많은 후보 API 함수 호출은 행동 공간을 크게 확장시켜 효율적인 행동 공간 탐색의 중요성을 더욱 부각시킨다. 그러나 기존 방법들은 방대한 행동 공간에서 단방향 탐색에 어려움을 겪거나 지역적 최적 해에 갇히는 문제가 있으며, 모든 잠재적 행동을 완전히 탐색함으로써 비효율적인 탐색을 초래한다. 이러한 문제를 해결하기 위해 우리는 LLM 기반 에이전트를 위한 효율적인 트리 탐색 기반 계획 알고리즘인 ToolChain*을 제안한다. 이 알고리즘은 전체 행동 공간을 의사결정 트리로 구성하며, 각 노드는 솔루션 계획에 포함될 수 있는 API 함수 호출을 나타낸다. A* 탐색 알고리즘을 과제 특화 비용 함수 설계와 결합함으로써, 잘못된 행동을 포함할 가능성이 높은 고비용 분기를 효율적으로 제거하고 가장 낮은 비용의 유효한 경로를 솔루션으로 식별한다. 다양한 도구 사용 및 추론 과제에 대한 광범위한 실험을 통해 ToolChain*이 방대한 행동 공간 내에서 탐색과 활용을 효율적으로 균형 있게 수행함을 입증하였다. 이 알고리즘은 계획 및 추론 과제에서 최신 기준선 대비 평균 3.1% 및 3.5% 더 우수한 성능을 보였으며, 각각 7.35배 및 2.31배 더 적은 시간을 요구하였다.

English

Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A* search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.

ToolChain: A 탐색을 활용한 대규모 언어 모델의 효율적인 행동 공간 탐색

ToolChain: Efficient Action Space Navigation in Large Language Models with A Search

초록

Support