大型语言模型作为工具制造者

摘要

最近的研究显示，通过使用外部工具，可以提升大型语言模型（LLMs）的问题解决能力的潜力。然而，沿着这一方向的先前工作取决于现有工具的可用性。在这项工作中，我们迈出了一小步，试图通过提出一个封闭循环框架来消除这种依赖，该框架被称为LLMs作为工具制造者（LATM），在这个框架中，LLMs为问题解决创建自己的可重用工具。我们的方法包括两个关键阶段：1）工具制造：LLM充当工具制造者，为给定任务制作工具，其中工具被实现为Python实用函数。2）工具使用：LLM充当工具用户，应用工具制造者构建的工具进行问题解决。工具用户可以是与工具制造者相同或不同的LLM。工具制造使LLM能够持续生成可应用于不同请求的工具，以便将来的请求在解决任务时可以调用相应的API。此外，LLMs在工具制造和工具使用阶段之间的分工引入了实现成本效益而不降低生成的工具和问题解决方案质量的机会。例如，认识到工具制造需要比工具使用更复杂的能力，我们可以将一个功能强大但资源密集型的模型应用为工具制造者，将一个轻量级且具有成本效益的模型应用为工具用户。我们验证了我们的方法在各种复杂推理任务中的有效性，包括Big-Bench任务。通过以GPT-4作为工具制造者和以GPT-3.5作为工具用户，LATM可以实现与同时使用GPT-4进行工具制造和工具使用相当的性能，同时推理成本大幅降低。

English

Recent research shows the potential of enhancing the problem-solving ability of large language models (LLMs) through the use of external tools. However, prior work along this line depends on the availability of existing tools. In this work, we take an initial step towards removing this dependency by proposing a closed-loop framework, referred to as LLMs As Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two key phases: 1) tool making: an LLM acts as the tool maker that crafts tools for given tasks, where a tool is implemented as a Python utility function. 2) tool using: an LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving. The tool user can be either the same or a different LLM from the tool maker. Tool-making enables an LLM to continually generate tools that can be applied to different requests so that future requests can call the corresponding APIs when beneficial for solving the tasks. Furthermore, the division of labor among LLMs for tool-making and tool-using phases introduces the opportunity to achieve cost effectiveness without degrading the quality of generated tools and problem solutions. For example, recognizing that tool-making demands more sophisticated capabilities than tool-using, we can apply a powerful yet resource-intensive model as the tool maker, and a lightweight while cost-effective model as the tool user. We validate the effectiveness of our approach across a variety of complex reasoning tasks, including Big-Bench tasks. With GPT-4 as the tool maker and GPT-3.5 as the tool user, LATM can achieve performance that is on par with using GPT-4 for both tool making and tool using, while the inference cost is significantly reduced.

大型语言模型作为工具制造者

Large Language Models as Tool Makers

摘要

Support