FrugalGPT：如何在降低成本和提高性能的同时使用大型语言模型

摘要

目前存在大量大型语言模型（LLMs），用户可以付费查询。我们审查了查询流行的LLM API（例如GPT-4、ChatGPT、J1-Jumbo）所需的成本，并发现这些模型具有异构的定价结构，费用相差两个数量级。特别是在大量查询和文本集合上使用LLMs可能会很昂贵。受此启发，我们概述并讨论了用户可以利用的三种策略来降低使用LLMs的推理成本：1）提示适应、2）LLM近似、3）LLM级联。作为示例，我们提出了FrugalGPT，这是LLM级联的一个简单而灵活的实例，它学习了在不同查询中使用哪些LLM组合以降低成本并提高准确性。我们的实验表明，FrugalGPT可以在减少高达98%的成本或在相同成本下提高4%的准确性的情况下，与最佳个体LLM（例如GPT-4）的性能相匹配。本文提出的想法和发现为可持续高效地使用LLMs奠定了基础。

English

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.

FrugalGPT：如何在降低成本和提高性能的同时使用大型语言模型

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

摘要

Support