FrugalGPT: Come Utilizzare Modelli Linguistici di Grande Dimensione Riducendo i Costi e Migliorando le Prestazioni

Abstract

Esiste un numero in rapida crescita di grandi modelli linguistici (LLM) che gli utenti possono interrogare a pagamento. Esaminiamo i costi associati all'interrogazione delle API di LLM popolari, come GPT-4, ChatGPT, J1-Jumbo, e scopriamo che questi modelli presentano strutture di prezzo eterogenee, con tariffe che possono differire di due ordini di grandezza. In particolare, l'uso di LLM su grandi raccolte di query e testi può risultare costoso. Motivati da ciò, delineiamo e discutiamo tre tipi di strategie che gli utenti possono sfruttare per ridurre i costi di inferenza associati all'uso degli LLM: 1) adattamento del prompt, 2) approssimazione dell'LLM e 3) cascata di LLM. Come esempio, proponiamo FrugalGPT, un'istanza semplice ma flessibile di cascata di LLM che apprende quali combinazioni di LLM utilizzare per diverse query al fine di ridurre i costi e migliorare l'accuratezza. I nostri esperimenti dimostrano che FrugalGPT può eguagliare le prestazioni del miglior LLM individuale (ad esempio GPT-4) con una riduzione dei costi fino al 98% o migliorare l'accuratezza rispetto a GPT-4 del 4% mantenendo lo stesso costo. Le idee e i risultati presentati qui gettano le basi per un uso sostenibile ed efficiente degli LLM.

English

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.

FrugalGPT: Come Utilizzare Modelli Linguistici di Grande Dimensione Riducendo i Costi e Migliorando le Prestazioni

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Abstract

Support