SAGE: RAG를 위한 정밀 검색 프레임워크

초록

검색 강화 생성(Retrieval-Augmented Generation, RAG)은 특정 코퍼스 내에서 질의응답(Question-Answering, QA) 작업을 수행하는 데 있어 상당한 능력을 보여주었습니다. 그러나 여전히 QA에서 RAG의 실패 사례가 다수 존재합니다. 이러한 실패는 대규모 언어 모델(Large Language Models, LLMs)의 한계만으로 설명되지 않으며, 주로 두 가지 제약으로 인해 LLM에 부정확한 정보가 검색되기 때문에 발생합니다: (1) 현재의 RAG 방법들은 의미를 고려하지 않고 코퍼스를 분할하기 때문에, 질문과 분할된 세그먼트 간의 상관관계가 손상되어 관련 컨텍스트를 찾기 어렵습니다. (2) 적은 양의 컨텍스트를 검색할 때 필수적인 컨텍스트가 누락되거나, 많은 양의 컨텍스트를 검색할 때 관련 없는 컨텍스트가 포함되는 트레이드오프가 존재합니다. 본 논문에서는 이러한 한계를 극복하기 위해 RAG 프레임워크(SAGE)를 소개합니다. 첫째, 의미를 고려하지 않은 분할 문제를 해결하기 위해 의미론적 분할 모델을 학습시키는 것을 제안합니다. 이 모델은 코퍼스를 의미적으로 완전한 청크로 분할하도록 학습됩니다. 둘째, 가장 관련성이 높은 청크만 검색되고 관련 없는 청크는 무시되도록 하기 위해, 관련성 점수의 감소 속도를 기반으로 동적으로 청크를 선택하는 알고리즘을 설계하여 더 관련성 높은 선택을 이끌어냅니다. 셋째, 검색된 청크의 정확성을 더욱 보장하기 위해, LLM이 검색된 청크가 과도하거나 부족한지를 평가한 후 컨텍스트 양을 조정하도록 제안합니다. 실험 결과, SAGE는 QA 품질에서 평균 61.25%로 기준 모델을 능가했습니다. 또한, 노이즈가 있는 컨텍스트를 검색하지 않음으로써 SAGE는 LLM 추론에서 소비되는 토큰 비용을 절감하고, 평균 49.41%의 비용 효율성 향상을 달성했습니다. 추가적으로, 본 연구는 RAG 성능 향상을 위한 유용한 통찰을 제공합니다.

English

Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.

SAGE: RAG를 위한 정밀 검색 프레임워크

SAGE: A Framework of Precise Retrieval for RAG

초록

Support