BubbleRAP：面向黑盒知识图谱的证据驱动检索增强生成

摘要

大型语言模型（LLMs）在知识密集型任务中会出现幻觉问题。基于图谱的检索增强生成（RAG）已成为一种有前景的解决方案，但在处理黑盒知识图谱（即图谱模式和结构未知的图谱）时，现有方法存在根本性的召回率和精确度限制。我们识别出导致召回损失（语义实例化不确定性和结构路径不确定性）与精确度损失（证据比较不确定性）的三项核心挑战。为解决这些挑战，我们将检索任务形式化为最优信息子图检索（OISR）问题——该问题是群斯坦纳树问题的变体，并证明其具有NP难和APX难特性。我们提出BubbleRAG这一免训练流程，通过语义锚点分组、启发式气泡扩展以发现候选证据图（CEGs）、复合排序及感知推理的扩展机制，系统化地同步优化召回率与精确度。在多跳问答基准测试上的实验表明，BubbleRAG实现了最先进的性能，在F1分数和准确率上均优于强基线方法，并保持即插即用特性。

English

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAP：面向黑盒知识图谱的证据驱动检索增强生成

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

摘要

Support