LongRAG: Potenziamento della Generazione Aumentata dal Recupero con LLM a Contesto Esteso

Abstract

Nel tradizionale framework RAG, le unità di recupero di base sono normalmente brevi. I comuni sistemi di recupero come DPR lavorano tipicamente con paragrafi di Wikipedia di 100 parole. Tale design costringe il sistema di recupero a cercare in un ampio corpus per trovare l'unità "ago nel pagliaio". Al contrario, i lettori devono solo estrarre risposte dalle brevi unità recuperate. Questo design sbilanciato, con un recuperatore "pesante" e un lettore "leggero", può portare a prestazioni subottimali. Per alleviare questo squilibrio, proponiamo un nuovo framework chiamato LongRAG, composto da un "recuperatore lungo" e un "lettore lungo". LongRAG elabora l'intera Wikipedia in unità di 4K token, che sono 30 volte più lunghe rispetto al passato. Aumentando la dimensione delle unità, riduciamo significativamente il numero totale di unità da 22 milioni a 700 mila. Questo riduce notevolmente il carico del recuperatore, portando a un punteggio di recupero notevole: recall@1 delle risposte del 71% su NQ (precedentemente 52%) e recall@2 delle risposte del 72% (precedentemente 47%) su HotpotQA (full-wiki). Successivamente, alimentiamo le prime k unità recuperate (circa 30K token) a un LLM a contesto lungo esistente per eseguire l'estrazione delle risposte in modalità zero-shot. Senza richiedere alcun addestramento, LongRAG raggiunge un EM del 62,7% su NQ, che è il miglior risultato noto. LongRAG raggiunge anche il 64,3% su HotpotQA (full-wiki), in linea con il modello SoTA. Il nostro studio offre spunti per il futuro percorso di combinazione di RAG con LLM a contesto lungo.

English

In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the `needle' unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced `heavy' retriever and `light' reader design can lead to sub-optimal performance. In order to alleviate the imbalance, we propose a new framework LongRAG, consisting of a `long retriever' and a `long reader'. LongRAG processes the entire Wikipedia into 4K-token units, which is 30x longer than before. By increasing the unit size, we significantly reduce the total units from 22M to 700K. This significantly lowers the burden of retriever, which leads to a remarkable retrieval score: answer recall@1=71% on NQ (previously 52%) and answer recall@2=72% (previously 47%) on HotpotQA (full-wiki). Then we feed the top-k retrieved units (approx 30K tokens) to an existing long-context LLM to perform zero-shot answer extraction. Without requiring any training, LongRAG achieves an EM of 62.7% on NQ, which is the best known result. LongRAG also achieves 64.3% on HotpotQA (full-wiki), which is on par of the SoTA model. Our study offers insights into the future roadmap for combining RAG with long-context LLMs.

LongRAG: Potenziamento della Generazione Aumentata dal Recupero con LLM a Contesto Esteso

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Abstract

Support