ECoRAG: 長文脈RAGのための証拠性に基づく圧縮

要旨

大規模言語モデル（LLMs）は、外部文書を活用するRetrieval-Augmented Generation（RAG）を通じて、Open-Domain Question Answering（ODQA）において顕著な性能を示している。RAGのオーバーヘッドを削減するため、長いコンテキストからコンテキスト圧縮が必要とされる。しかし、従来の圧縮手法は非証拠情報のフィルタリングに焦点を当てておらず、これがLLMベースのRAGの性能を制限している。そこで、我々はEvidentiality-guided RAG、すなわちECoRAGフレームワークを提案する。ECoRAGは、証拠性に基づいて検索された文書を圧縮し、回答生成が正しい証拠によって支持されているかどうかを保証することで、LLMの性能を向上させる。追加のステップとして、ECoRAGは圧縮された内容が十分な証拠を提供しているかどうかを反映し、そうでない場合は十分な証拠が得られるまでさらに検索を行う。実験結果は、ECoRAGがODQAタスクにおいてLLMの性能を向上させ、既存の圧縮手法を上回ることを示している。さらに、ECoRAGは非常にコスト効率が高く、レイテンシを削減するだけでなく、正しい回答を生成するために必要な情報のみを保持することでトークン使用量を最小限に抑える。コードはhttps://github.com/ldilab/ECoRAGで公開されている。

English

Large Language Models (LLMs) have shown remarkable performance in Open-Domain Question Answering (ODQA) by leveraging external documents through Retrieval-Augmented Generation (RAG). To reduce RAG overhead, from longer context, context compression is necessary. However, prior compression methods do not focus on filtering out non-evidential information, which limit the performance in LLM-based RAG. We thus propose Evidentiality-guided RAG, or ECoRAG framework. ECoRAG improves LLM performance by compressing retrieved documents based on evidentiality, ensuring whether answer generation is supported by the correct evidence. As an additional step, ECoRAG reflects whether the compressed content provides sufficient evidence, and if not, retrieves more until sufficient. Experiments show that ECoRAG improves LLM performance on ODQA tasks, outperforming existing compression methods. Furthermore, ECoRAG is highly cost-efficient, as it not only reduces latency but also minimizes token usage by retaining only the necessary information to generate the correct answer. Code is available at https://github.com/ldilab/ECoRAG.

ECoRAG: 長文脈RAGのための証拠性に基づく圧縮

ECoRAG: Evidentiality-guided Compression for Long Context RAG

要旨

Support