arXiv: 2511.10240v1

ProgRAG:基於知識圖譜的抗幻覺漸進式檢索與推理

ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs

November 13, 2025
作者: Minbae Park, Hyemin Yang, Jeonghyun Kim, Kunsoo Park, Hyunjoon Kim
cs.AIcs.AIcs.CLcs.AI

摘要

大型語言模型(LLMs)展現出強大的推理能力,但在幻覺問題和透明度有限方面仍存在挑戰。近期,結合知識圖譜(KGs)的KG增強型LLMs被證明能提升推理表現,尤其是在處理複雜且知識密集型的任務時。然而,這些方法仍面臨重大挑戰,包括檢索不準確和推理失敗,這些問題往往因長輸入上下文遮蔽相關信息,或因上下文構建難以捕捉不同問題類型所需的更豐富邏輯方向而加劇。此外,許多這些方法依賴LLMs直接從KGs中檢索證據,並自我評估這些證據的充分性,這常常導致過早或不正確的推理。為解決檢索和推理失敗的問題,我們提出了ProgRAG,這是一個多跳知識圖譜問答(KGQA)框架,它將複雜問題分解為子問題,並通過回答每個子問題逐步擴展部分推理路徑。在每一步中,外部檢索器收集候選證據,然後由LLM通過不確定性感知的修剪進行精煉。最後,通過組織和重新排列從子問題答案中獲得的部分推理路徑,優化LLM推理的上下文。在三個知名數據集上的實驗表明,ProgRAG在多跳KGQA中優於現有基線,提供了更高的可靠性和推理質量。
English
Large Language Models (LLMs) demonstrate strong reasoning capabilities but struggle with hallucinations and limited transparency. Recently, KG-enhanced LLMs that integrate knowledge graphs (KGs) have been shown to improve reasoning performance, particularly for complex, knowledge-intensive tasks. However, these methods still face significant challenges, including inaccurate retrieval and reasoning failures, often exacerbated by long input contexts that obscure relevant information or by context constructions that struggle to capture the richer logical directions required by different question types. Furthermore, many of these approaches rely on LLMs to directly retrieve evidence from KGs, and to self-assess the sufficiency of this evidence, which often results in premature or incorrect reasoning. To address the retrieval and reasoning failures, we propose ProgRAG, a multi-hop knowledge graph question answering (KGQA) framework that decomposes complex questions into sub-questions, and progressively extends partial reasoning paths by answering each sub-question. At each step, external retrievers gather candidate evidence, which is then refined through uncertainty-aware pruning by the LLM. Finally, the context for LLM reasoning is optimized by organizing and rearranging the partial reasoning paths obtained from the sub-question answers. Experiments on three well-known datasets demonstrate that ProgRAG outperforms existing baselines in multi-hop KGQA, offering improved reliability and reasoning quality.
PDFNovember 15, 2025