Efficienza-Efficacia nel Ricalcolo dei FLOP per i Rirankers Basati su LLM

Abstract

I modelli linguistici di grandi dimensioni (LLM) sono stati recentemente applicati ai task di riordinamento nel campo del recupero delle informazioni, ottenendo prestazioni significative. Tuttavia, le loro elevate esigenze computazionali spesso ne ostacolano l'implementazione pratica. Gli studi esistenti valutano l'efficienza dei riordinatori basati su LLM utilizzando metriche proxy come la latenza, il numero di passaggi in avanti, i token di input e i token di output. Tuttavia, queste metriche dipendono dall'hardware e dalle scelte di esecuzione (ad esempio, esecuzione parallela o meno, dimensione del batch, ecc.) e spesso non tengono conto delle dimensioni del modello, rendendo difficile l'interpretazione e oscurando la valutazione del compromesso tra efficienza ed efficacia. Per affrontare questo problema, proponiamo E2R-FLOPs per i riordinatori basati su LLM: metriche di ranking per PetaFLOP (RPP) per la rilevanza rispetto al calcolo e query per PetaFLOP (QPP) per un throughput indipendente dall'hardware. Accompagnate da queste nuove metriche, è stato costruito un stimatore interpretabile dei FLOPs per stimare i FLOPs di un riordinatore basato su LLM anche senza eseguire alcun esperimento. Sulla base delle metriche proposte, conduciamo esperimenti completi per valutare una vasta gamma di riordinatori basati su LLM con diverse architetture, studiando il compromesso tra efficienza ed efficacia e portando questa questione all'attenzione della comunità di ricerca.

English

Large Language Models (LLMs) have recently been applied to reranking tasks in information retrieval, achieving strong performance. However, their high computational demands often hinder practical deployment. Existing studies evaluate the efficiency of LLM-based rerankers using proxy metrics such as latency, the number of forward passes, input tokens, and output tokens. However, these metrics depend on hardware and running-time choices (\eg parallel or not, batch size, etc), and often fail to account for model size, making it difficult to interpret and obscuring the evaluation of the efficiency-effectiveness tradeoff. To address this issue, we propose E2R-FLOPs, for LLM-based rerankers: ranking metrics per PetaFLOP (RPP) for relevance per compute and queries per PetaFLOP (QPP) for hardware-agnostic throughput. Companied with the new metrics, an interpretable FLOPs estimator is built to estimate the FLOPs of an LLM-based reranker even without running any experiments. Based on the proposed metrics, we conduct comprehensive experiments to evaluate a wide range of LLM-based rerankers with different architecture, studying the efficiency-effectiveness trade-off and bringing this issue to the attention of the research community.

Efficienza-Efficacia nel Ricalcolo dei FLOP per i Rirankers Basati su LLM

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Abstract

Support