BubbleRAG: 블랙박스 지식 그래프를 위한 증거 기반 검색 증강 생성

초록

대규모 언어 모델(LLM)은 지식 집약적 작업에서 환각 현상을 보입니다. 그래프 기반 검색 증강 생성(RAG)이 유망한 해결책으로 부상했지만, 기존 접근법은 블랙박스 지식 그래프(사전에 스키마와 구조가 알려지지 않은 그래프)에서 작동할 때 근본적인 재현율과 정밀도의 한계를 겪습니다. 우리는 재현율 손실(의미론적 인스턴스화 불확실성 및 구조적 경로 불확실성)과 정밀도 손실(증거 비교 불확실성)을 유발하는 세 가지 핵심 과제를 규명합니다. 이러한 과제를 해결하기 위해 검색 작업을 최적 정보 하위 그래프 검색(OISR) 문제—그룹 슈타이너 트리의 변형—로 공식화하고, 이 문제가 NP-난해 및 APX-난해임을 증명합니다. 우리는 의미론적 앵커 그룹화, 후보 증거 그래프(CEG) 발견을 위한 휴리스틱 버블 확장, 복합 랭킹, 추론 인식 확장을 통해 재현율과 정밀도를 체계적으로 최적화하는 학습 불필요 파이프라인인 BubbleRAG를 제안합니다. 다중 홉 질의응답 벤치마크에서의 실험 결과, BubbleRAG는 플러그 앤 플레이 방식을 유지하면서 F1과 정확도 모두에서 강력한 베이스라인을 능가하는 최첨단 성과를 달성함을 보여줍니다.

English

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAG: 블랙박스 지식 그래프를 위한 증거 기반 검색 증강 생성

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

초록

Support