FrugalGPT: 비용 절감과 성능 향상을 동시에 달성하는 대규모 언어 모델 활용 전략

초록

사용자가 유료로 질의할 수 있는 대규모 언어 모델(LLM)의 수가 빠르게 증가하고 있습니다. 우리는 GPT-4, ChatGPT, J1-Jumbo 등 인기 있는 LLM API를 질의하는 데 드는 비용을 검토했으며, 이러한 모델들이 두 배수 이상 차이가 나는 이질적인 가격 구조를 가지고 있음을 발견했습니다. 특히, 대규모 질의 및 텍스트 컬렉션에 LLM을 사용하는 것은 비용이 많이 들 수 있습니다. 이를 계기로, 우리는 사용자가 LLM 사용과 관련된 추론 비용을 줄이기 위해 활용할 수 있는 세 가지 전략 유형을 개요하고 논의합니다: 1) 프롬프트 적응, 2) LLM 근사화, 3) LLM 캐스케이드. 예를 들어, 우리는 비용을 줄이고 정확도를 높이기 위해 다양한 질의에 어떤 LLM 조합을 사용할지 학습하는 LLM 캐스케이드의 간단하면서도 유연한 구현체인 FrugalGPT를 제안합니다. 우리의 실험 결과, FrugalGPT는 최고의 개별 LLM(예: GPT-4)의 성능을 최대 98%의 비용 절감으로 맞추거나 동일한 비용으로 GPT-4보다 4% 더 높은 정확도를 달성할 수 있음을 보여줍니다. 여기서 제시된 아이디어와 발견은 LLM을 지속 가능하고 효율적으로 사용하기 위한 기반을 마련합니다.

English

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.

FrugalGPT: 비용 절감과 성능 향상을 동시에 달성하는 대규모 언어 모델 활용 전략

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

초록

Support