HANRAG:面向多跳问答的启发式精准抗噪检索增强生成
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering
September 8, 2025
作者: Duolin Sun, Dan Yang, Yue Shen, Yihan Jiao, Zhehao Tan, Jie Feng, Lianzhen Zhong, Jian Wang, Peng Wei, Jinjie Gu
cs.AI
摘要
檢索增強生成(RAG)方法通過整合資訊檢索(IR)技術與大型語言模型(LLMs),提升了問答系統和對話生成任務的效能。此策略從外部知識庫中檢索資訊以增強生成模型的回應能力,已取得一定成功。然而,現有的RAG方法在處理多跳查詢時仍面臨諸多挑戰。例如,某些方法過度依賴迭代檢索,在處理複合查詢時浪費了過多檢索步驟。此外,使用原始複雜查詢進行檢索可能無法捕捉到與特定子查詢相關的內容,導致檢索結果含有噪聲。若噪聲未得到妥善處理,將導致噪聲累積問題。為解決這些問題,我們提出了HANRAG,這是一個基於啟發式的新框架,旨在高效應對不同複雜度的問題。HANRAG由一個強大的揭示器驅動,負責路由查詢、將其分解為子查詢,並從檢索到的文件中過濾噪聲。這增強了系統的適應性和抗噪能力,使其能夠出色地處理多樣化的查詢。我們將所提框架與其他領先的業界方法在多個基準上進行了比較。結果顯示,我們的框架在單跳和多跳問答任務中均表現出優異的性能。
English
The Retrieval-Augmented Generation (RAG) approach enhances question-answering
systems and dialogue generation tasks by integrating information retrieval (IR)
technologies with large language models (LLMs). This strategy, which retrieves
information from external knowledge bases to bolster the response capabilities
of generative models, has achieved certain successes. However, current RAG
methods still face numerous challenges when dealing with multi-hop queries. For
instance, some approaches overly rely on iterative retrieval, wasting too many
retrieval steps on compound queries. Additionally, using the original complex
query for retrieval may fail to capture content relevant to specific
sub-queries, resulting in noisy retrieved content. If the noise is not managed,
it can lead to the problem of noise accumulation. To address these issues, we
introduce HANRAG, a novel heuristic-based framework designed to efficiently
tackle problems of varying complexity. Driven by a powerful revelator, HANRAG
routes queries, decomposes them into sub-queries, and filters noise from
retrieved documents. This enhances the system's adaptability and noise
resistance, making it highly capable of handling diverse queries. We compare
the proposed framework against other leading industry methods across various
benchmarks. The results demonstrate that our framework obtains superior
performance in both single-hop and multi-hop question-answering tasks.