FrugalGPT: コスト削減と性能向上を両立させる大規模言語モデルの活用方法

要旨

ユーザーが有料で問い合わせ可能な大規模言語モデル（LLM）が急速に増加しています。私たちは、GPT-4、ChatGPT、J1-Jumboなどの人気のあるLLM APIの問い合わせコストを調査し、これらのモデルが異なる価格体系を持ち、その料金が最大で2桁の差があることを明らかにしました。特に、大量のクエリやテキストに対してLLMを使用することは高額になり得ます。この問題を動機として、LLMの使用に伴う推論コストを削減するためにユーザーが活用できる3つの戦略を概説し、議論します：1）プロンプト適応、2）LLM近似、3）LLMカスケード。例として、FrugalGPTを提案します。これは、LLMカスケードのシンプルで柔軟な実装であり、異なるクエリに対してどのLLMの組み合わせを使用するかを学習し、コストを削減しつつ精度を向上させます。私たちの実験では、FrugalGPTが最良の個別LLM（例：GPT-4）の性能を維持しつつ、最大98%のコスト削減を実現するか、同じコストでGPT-4よりも4%の精度向上を達成できることを示しています。ここで提示されたアイデアと発見は、LLMを持続可能かつ効率的に使用するための基盤を築くものです。

English

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.

FrugalGPT: コスト削減と性能向上を両立させる大規模言語モデルの活用方法

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

要旨

Support