大型語言模型作為工具製造者
Large Language Models as Tool Makers
May 26, 2023
作者: Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou
cs.AI
摘要
最近的研究顯示,透過使用外部工具,可以提升大型語言模型(LLMs)的解決問題能力的潛力。然而,先前的研究在這方面取決於現有工具的可用性。在這項研究中,我們採取了一個初步步驟,試圖消除這種依賴性,提出了一個封閉迴路框架,稱為LLMs作為工具製造者(LATM),在這個框架中,LLMs創建自己可重複使用的工具來解決問題。我們的方法包括兩個關鍵階段:1)工具製造:LLM充當工具製造者,為特定任務製作工具,其中工具以Python實用函數的形式實現。2)工具使用:LLM充當工具使用者,應用工具製造者製作的工具來解決問題。工具使用者可以是與工具製造者相同或不同的LLM。工具製造使LLM能夠持續生成可應用於不同請求的工具,以便未來請求在解決任務時可以調用相應的API。此外,LLMs在工具製造和工具使用階段之間的分工引入了實現成本效益而不降低生成的工具和問題解決方案質量的機會。例如,認識到工具製造需要比工具使用更複雜的能力,我們可以將一個功能強大但資源密集的模型應用為工具製造者,將一個輕量且成本效益的模型應用為工具使用者。我們驗證了我們的方法在各種複雜推理任務上的有效性,包括Big-Bench任務。使用GPT-4作為工具製造者,GPT-3.5作為工具使用者,LATM可以實現與僅使用GPT-4進行工具製造和工具使用相當的性能,同時顯著降低推論成本。
English
Recent research shows the potential of enhancing the problem-solving ability
of large language models (LLMs) through the use of external tools. However,
prior work along this line depends on the availability of existing tools. In
this work, we take an initial step towards removing this dependency by
proposing a closed-loop framework, referred to as LLMs As Tool Makers (LATM),
where LLMs create their own reusable tools for problem-solving. Our approach
consists of two key phases: 1) tool making: an LLM acts as the tool maker that
crafts tools for given tasks, where a tool is implemented as a Python utility
function. 2) tool using: an LLM acts as the tool user, which applies the tool
built by the tool maker for problem-solving. The tool user can be either the
same or a different LLM from the tool maker. Tool-making enables an LLM to
continually generate tools that can be applied to different requests so that
future requests can call the corresponding APIs when beneficial for solving the
tasks. Furthermore, the division of labor among LLMs for tool-making and
tool-using phases introduces the opportunity to achieve cost effectiveness
without degrading the quality of generated tools and problem solutions. For
example, recognizing that tool-making demands more sophisticated capabilities
than tool-using, we can apply a powerful yet resource-intensive model as the
tool maker, and a lightweight while cost-effective model as the tool user. We
validate the effectiveness of our approach across a variety of complex
reasoning tasks, including Big-Bench tasks. With GPT-4 as the tool maker and
GPT-3.5 as the tool user, LATM can achieve performance that is on par with
using GPT-4 for both tool making and tool using, while the inference cost is
significantly reduced.