節約GPT：如何在降低成本並提高性能的同時使用大型語言模型

摘要

目前有越來越多的大型語言模型（LLMs）可供用戶付費查詢。我們回顧了查詢流行的LLM API（例如GPT-4、ChatGPT、J1-Jumbo）所需的成本，發現這些模型具有異構的價格結構，費用相差兩個數量級。特別是在大量查詢和文本集合上使用LLMs可能會很昂貴。受此啟發，我們概述並討論了用戶可以利用的三種策略，以降低使用LLMs的推論成本：1）提示適應、2）LLM逼近和3）LLM串聯。作為示例，我們提出了FrugalGPT，這是LLM串聯的一個簡單而靈活的實例，它學習了在不同查詢中使用哪些LLM組合以降低成本並提高準確性。我們的實驗表明，FrugalGPT可以以高達98％的成本降低與最佳個別LLM（例如GPT-4）的性能匹配，或在相同成本下將準確性提高4％。這裡提出的想法和發現為可持續且高效地使用LLMs奠定了基礎。

English

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.

節約GPT：如何在降低成本並提高性能的同時使用大型語言模型

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

摘要

Support