arXiv: 2511.10240v1
ProgRAG:基于知识图谱的抗幻觉渐进式检索与推理
ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs
November 13, 2025
作者: Minbae Park, Hyemin Yang, Jeonghyun Kim, Kunsoo Park, Hyunjoon Kim
cs.AIcs.AIcs.CLcs.AI
摘要
大型语言模型(LLMs)展现出强大的推理能力,但在处理幻觉问题和透明度不足方面仍存在局限。近期,结合知识图谱(KGs)的KG增强型LLMs被证实能提升推理性能,尤其是在处理复杂且知识密集型的任务时。然而,这些方法仍面临重大挑战,包括检索不准确和推理失败,这些问题往往因长输入上下文掩盖了相关信息,或因上下文构建难以捕捉不同问题类型所需的丰富逻辑方向而加剧。此外,许多方法依赖LLMs直接从KGs中检索证据,并自我评估这些证据的充分性,这常常导致推理过早或错误。为解决检索和推理失败的问题,我们提出了ProgRAG,一个多跳知识图谱问答(KGQA)框架,该框架将复杂问题分解为子问题,并通过逐一回答子问题逐步扩展部分推理路径。在每一步中,外部检索器收集候选证据,随后由LLM通过不确定性感知的剪枝进行精炼。最后,通过组织和重排从子问题答案中获得的局部推理路径,优化了LLM推理的上下文。在三个知名数据集上的实验表明,ProgRAG在多跳KGQA任务中优于现有基线,提供了更高的可靠性和推理质量。
English
Large Language Models (LLMs) demonstrate strong reasoning capabilities but struggle with hallucinations and limited transparency. Recently, KG-enhanced LLMs that integrate knowledge graphs (KGs) have been shown to improve reasoning performance, particularly for complex, knowledge-intensive tasks. However, these methods still face significant challenges, including inaccurate retrieval and reasoning failures, often exacerbated by long input contexts that obscure relevant information or by context constructions that struggle to capture the richer logical directions required by different question types. Furthermore, many of these approaches rely on LLMs to directly retrieve evidence from KGs, and to self-assess the sufficiency of this evidence, which often results in premature or incorrect reasoning. To address the retrieval and reasoning failures, we propose ProgRAG, a multi-hop knowledge graph question answering (KGQA) framework that decomposes complex questions into sub-questions, and progressively extends partial reasoning paths by answering each sub-question. At each step, external retrievers gather candidate evidence, which is then refined through uncertainty-aware pruning by the LLM. Finally, the context for LLM reasoning is optimized by organizing and rearranging the partial reasoning paths obtained from the sub-question answers. Experiments on three well-known datasets demonstrate that ProgRAG outperforms existing baselines in multi-hop KGQA, offering improved reliability and reasoning quality.