BubbleRAG: ブラックボックス知識グラフのためのエビデンス駆動型検索拡張生成

要旨

大規模言語モデル（LLM）は知識集約型タスクにおいて幻覚を示す。グラフベースの検索拡張生成（RAG）は有望な解決策として登場したが、既存の手法はブラックボックス知識グラフ（スキーマと構造が事前に未知なグラフ）上で動作する際、基本的な再現率と適合率の限界に直面している。本論文では、再現率損失（意味的インスタンス化不確実性と構造的経路不確実性）と適合率損失（証拠比較不確実性）を引き起こす3つの核心的課題を特定する。これらの課題に対処するため、検索タスクを最適情報部分グラフ検索（OISR）問題（Group Steiner Treeの変種）として定式化し、これがNP困難かつAPX困難であることを証明する。我々はBubbleRAGを提案する。これは、意味的アンカーグループ化、候補証拠グラフ（CEG）発見のためのヒューリスティックなバブル拡張、複合ランキング、推論考慮型拡張を通じて、再現率と適合率を体系的に最適化するトレーニング不要のパイプラインである。マルチホップQAベンチマークによる実験では、BubbleRAGがプラグアンドプレイ性を維持しつつ、F1と精度の両方で強力なベースラインを上回り、最先端の結果を達成することを実証する。

English

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAG: ブラックボックス知識グラフのためのエビデンス駆動型検索拡張生成

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

要旨

Support