SAGE: RAGのための精密検索フレームワーク

要旨

検索拡張生成（RAG）は、特定のコーパス内での質問応答（QA）タスクにおいて、顕著な能力を発揮することが実証されています。しかしながら、RAGのQAにおける失敗事例も依然として数多く存在します。これらの失敗は、大規模言語モデル（LLM）の限界にのみ起因するものではなく、主として以下の2つの制約により、LLMに対して不正確な情報が検索されることから生じています。(1) 現在のRAG手法は、セマンティクスを考慮せずにコーパスを分割するため、質問とセグメント間の関連性が損なわれ、関連する文脈を見つけることが困難です。(2) 検索する文脈の量が少ないと必要な文脈が欠落し、多いと無関係な文脈が含まれるというトレードオフが存在します。本論文では、これらの制約を克服するためのRAGフレームワーク（SAGE）を提案します。まず、セマンティクスを考慮しない分割問題に対処するため、セマンティックセグメンテーションモデルを訓練することを提案します。このモデルは、コーパスを意味的に完全なチャンクに分割するように訓練されます。次に、最も関連性の高いチャンクのみが検索され、無関係なチャンクが無視されることを保証するため、関連性スコアの減少速度に基づいてチャンクを動的に選択するアルゴリズムを設計します。これにより、より関連性の高い選択が可能となります。さらに、検索されたチャンクの精度をさらに確保するため、LLMに検索されたチャンクが過剰または不足しているかどうかを評価させ、それに応じて文脈の量を調整することを提案します。実験結果は、SAGEがQAの品質においてベースラインを平均61.25%上回ることを示しています。さらに、ノイズの多い文脈の検索を回避することで、SAGEはLLM推論で消費されるトークンのコストを削減し、平均49.41%のコスト効率の向上を達成します。加えて、本研究はRAGを強化するための貴重な知見を提供します。

English

Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.

SAGE: RAGのための精密検索フレームワーク

SAGE: A Framework of Precise Retrieval for RAG

要旨

Support