BubbleRAG: Evidenzbasierte Retrieval-Augmented Generation für Black-Box-Wissensgraphen

Zusammenfassung

Große Sprachmodelle (LLMs) zeigen Halluzinationen bei wissensintensiven Aufgaben. Graphbasierte, retrieval-unterstützte Generierung (RAG) hat sich als vielversprechende Lösung erwiesen, doch bestehende Ansätze leiden unter grundlegenden Einschränkungen bei Recall und Präzision bei der Arbeit mit Black-Box-Wissensgraphen – Graphen, deren Schema und Struktur im Voraus unbekannt sind. Wir identifizieren drei Kernherausforderungen, die zu Recall-Verlust (semantische Instanziierungsunsicherheit und strukturelle Pfadunsicherheit) und Präzisionsverlust (Unsicherheit beim evidenzbasierten Vergleich) führen. Um diese Herausforderungen zu adressieren, formalisieren wir die Retrieval-Aufgabe als Optimal Informative Subgraph Retrieval (OISR) Problem – eine Variante des Group Steiner Tree – und beweisen, dass es NP-schwer und APX-schwer ist. Wir schlagen BubbleRAG vor, eine trainierungsfreie Pipeline, die systematisch sowohl Recall als auch Präzision durch semantische Anker-Gruppierung, heuristische Bubble-Erweiterung zur Entdeckung von Kandidaten-Evidenzgraphen (CEGs), kombiniertes Ranking und reasoning-bewusste Erweiterung optimiert. Experimente auf Multi-Hop-QA-Benchmarks zeigen, dass BubbleRAG state-of-the-art Ergebnisse erzielt, starke Baseline-Methoden sowohl in F1 als auch Genauigkeit übertrifft und dabei plug-and-play-fähig bleibt.

English

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAG: Evidenzbasierte Retrieval-Augmented Generation für Black-Box-Wissensgraphen

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Zusammenfassung

Support