HANRAG: マルチホップ質問応答のためのヒューリスティックで正確かつノイズ耐性のある検索拡張生成

要旨

検索拡張生成（RAG）アプローチは、情報検索（IR）技術と大規模言語モデル（LLM）を統合することで、質問応答システムや対話生成タスクを強化します。この戦略は、外部の知識ベースから情報を検索して生成モデルの応答能力を高めることで、一定の成功を収めています。しかし、現在のRAG手法は、マルチホップクエリを扱う際に多くの課題に直面しています。例えば、一部のアプローチでは反復的な検索に過度に依存し、複合クエリに対して多くの検索ステップを浪費しています。また、元の複雑なクエリをそのまま検索に使用すると、特定のサブクエリに関連する内容を捕捉できず、ノイズの多い検索結果が得られることがあります。このノイズが管理されない場合、ノイズ蓄積の問題が発生する可能性があります。これらの課題に対処するため、我々はHANRAGという新しいヒューリスティックベースのフレームワークを提案します。HANRAGは強力なリベレーターによって駆動され、クエリをルーティングし、サブクエリに分解し、検索されたドキュメントからノイズをフィルタリングします。これにより、システムの適応性とノイズ耐性が向上し、多様なクエリを高度に処理できるようになります。我々は、提案されたフレームワークを他の主要な業界手法とさまざまなベンチマークで比較しました。その結果、我々のフレームワークがシングルホップおよびマルチホップの質問応答タスクにおいて優れた性能を発揮することが示されました。

English

The Retrieval-Augmented Generation (RAG) approach enhances question-answering systems and dialogue generation tasks by integrating information retrieval (IR) technologies with large language models (LLMs). This strategy, which retrieves information from external knowledge bases to bolster the response capabilities of generative models, has achieved certain successes. However, current RAG methods still face numerous challenges when dealing with multi-hop queries. For instance, some approaches overly rely on iterative retrieval, wasting too many retrieval steps on compound queries. Additionally, using the original complex query for retrieval may fail to capture content relevant to specific sub-queries, resulting in noisy retrieved content. If the noise is not managed, it can lead to the problem of noise accumulation. To address these issues, we introduce HANRAG, a novel heuristic-based framework designed to efficiently tackle problems of varying complexity. Driven by a powerful revelator, HANRAG routes queries, decomposes them into sub-queries, and filters noise from retrieved documents. This enhances the system's adaptability and noise resistance, making it highly capable of handling diverse queries. We compare the proposed framework against other leading industry methods across various benchmarks. The results demonstrate that our framework obtains superior performance in both single-hop and multi-hop question-answering tasks.

HANRAG: マルチホップ質問応答のためのヒューリスティックで正確かつノイズ耐性のある検索拡張生成

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

要旨

Support