SAGE：面向RAG的精確檢索框架

摘要

檢索增強生成（RAG）在特定語料庫中執行問答（QA）任務方面展現了顯著的效能。然而，RAG在QA中的失敗案例仍然眾多。這些失敗不僅歸因於大型語言模型（LLMs）的局限性，反而主要源於為LLMs檢索到的不準確資訊，這是由於兩個限制因素所致：(1) 現有的RAG方法在分割語料庫時未考慮語義，導致因問題與段落間關聯性受損而難以找到相關上下文。(2) 在檢索較少上下文時遺漏關鍵資訊與檢索較多上下文時引入不相關資訊之間存在權衡。本文中，我們提出了一種RAG框架（SAGE），以克服這些限制。首先，為解決未考慮語義的分割問題，我們提出訓練一個語義分割模型。該模型旨在將語料庫分割成語義完整的片段。其次，為確保僅檢索最相關的片段而忽略不相關的，我們設計了一種片段選擇算法，基於相關性分數的下降速度動態選擇片段，從而實現更精準的選擇。第三，為進一步確保檢索片段的精確性，我們建議讓LLMs評估檢索到的片段是否過多或不足，並據此調整上下文的數量。實驗表明，SAGE在QA質量上平均優於基準方法61.25%。此外，通過避免檢索噪聲上下文，SAGE降低了LLM推理中消耗的token成本，平均提升了49.41%的成本效益。此外，我們的工作為提升RAG提供了寶貴的見解。

English

Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.

SAGE：面向RAG的精確檢索框架

SAGE: A Framework of Precise Retrieval for RAG

摘要

Support