ECoRAG: 장문 맥락 RAG를 위한 증거성 기반 압축

초록

대규모 언어 모델(LLMs)은 외부 문서를 활용한 검색 증강 생성(Retrieval-Augmented Generation, RAG)을 통해 개방형 질의응답(Open-Domain Question Answering, ODQA)에서 뛰어난 성능을 보여왔다. RAG의 오버헤드를 줄이기 위해, 더 긴 문맥에서 문맥 압축이 필요하다. 그러나 기존의 압축 방법들은 비증거적 정보를 걸러내는 데 초점을 맞추지 않아 LLM 기반 RAG의 성능을 제한한다. 이에 우리는 증거성 기반 RAG, 즉 ECoRAG 프레임워크를 제안한다. ECoRAG는 검색된 문서를 증거성에 기반하여 압축함으로써 답변 생성이 올바른 증거에 의해 지원되는지 확인하며 LLM 성능을 향상시킨다. 추가 단계로, ECoRAG는 압축된 내용이 충분한 증거를 제공하는지 반영하고, 그렇지 않은 경우 충분한 증거가 확보될 때까지 더 많은 문서를 검색한다. 실험 결과, ECoRAG는 ODQA 작업에서 LLM 성능을 향상시키며 기존의 압축 방법들을 능가하는 것으로 나타났다. 또한 ECoRAG는 지연 시간을 줄일 뿐만 아니라 올바른 답변을 생성하는 데 필요한 정보만을 유지함으로써 토큰 사용을 최소화하여 매우 비용 효율적이다. 코드는 https://github.com/ldilab/ECoRAG에서 확인할 수 있다.

English

Large Language Models (LLMs) have shown remarkable performance in Open-Domain Question Answering (ODQA) by leveraging external documents through Retrieval-Augmented Generation (RAG). To reduce RAG overhead, from longer context, context compression is necessary. However, prior compression methods do not focus on filtering out non-evidential information, which limit the performance in LLM-based RAG. We thus propose Evidentiality-guided RAG, or ECoRAG framework. ECoRAG improves LLM performance by compressing retrieved documents based on evidentiality, ensuring whether answer generation is supported by the correct evidence. As an additional step, ECoRAG reflects whether the compressed content provides sufficient evidence, and if not, retrieves more until sufficient. Experiments show that ECoRAG improves LLM performance on ODQA tasks, outperforming existing compression methods. Furthermore, ECoRAG is highly cost-efficient, as it not only reduces latency but also minimizes token usage by retaining only the necessary information to generate the correct answer. Code is available at https://github.com/ldilab/ECoRAG.

ECoRAG: 장문 맥락 RAG를 위한 증거성 기반 압축

ECoRAG: Evidentiality-guided Compression for Long Context RAG

초록

Support