BubbleRAG: Evidence-Gedreven Retrieval-Augmented Generation voor Black-Box Kennisgrafen

Samenvatting

Grote Taalmodellen (LLM's) vertonen hallucinaties bij kennisintensieve taken. Op grafen gebaseerde retrieval-augmented generation (RAG) is naar voren gekomen als een veelbelovende oplossing, maar bestaande benaderingen lijden onder fundamentele beperkingen in recall en precisie bij gebruik op black-box knowledge graphs – grafen waarvan het schema en de structuur van tevoren onbekend zijn. Wij identificeren drie kernuitdagingen die recall-verlies veroorzaken (semantische instantiatieonzekerheid en structurele padonzekerheid) en precisieverlies (bewijskrachtvergelijkingsonzekerheid). Om deze uitdagingen aan te pakken, formaliseren we de retrievetaak als het Optimal Informative Subgraph Retrieval (OISR) probleem – een variant van het Group Steiner Tree probleem – en bewijzen we dat dit NP-moeilijk en APX-moeilijk is. Wij stellen BubbleRAG voor, een trainingsvrije pipeline die systematisch optimaliseert voor zowel recall als precisie door middel van semantische anker-groepering, heuristische bubble-expansie om kandidaat-bewijsgrafen (CEG's) te ontdekken, composiete rangschikking en reasoning-aware expansie. Experimenten op multi-hop QA benchmarks tonen aan dat BubbleRAG state-of-the-art resultaten behaalt, sterke baseline-methoden overtreft in zowel F1 als nauwkeurigheid, en plug-and-play blijft.

English

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAG: Evidence-Gedreven Retrieval-Augmented Generation voor Black-Box Kennisgrafen

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Samenvatting

Support