ChatPaper.aiChatPaper

HalluGuard:基於證據的小型推理模型,用於減輕檢索增強生成中的幻覺問題

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

October 1, 2025
作者: Loris Bergeron, Ioana Buhnila, Jérôme François, Radu State
cs.AI

摘要

大型語言模型(LLMs)在許多自然語言處理任務中表現卓越,但仍容易產生幻覺,這限制了其在實際應用中的可信度。我們提出了HalluGuard,這是一個擁有40億參數的小型推理模型(SRM),旨在緩解檢索增強生成(RAG)中的幻覺問題。HalluGuard將文檔-聲明對分類為有根據的或幻覺的,並生成基於證據的解釋以提高透明度。我們的方法結合了:(i) 從FineWeb衍生並通過多階段策劃和數據重構精煉的領域無關合成數據集,(ii) 合成的有根據和幻覺的聲明,以及(iii) 使用基於偏好的微調與勝率偏好優化,將大型模型的推理能力蒸餾到較小的骨幹中。在LLM-AggreFact基準測試的RAGTruth子集上,HalluGuard達到了84.0%的平衡準確率(BAcc),與專用模型MiniCheck(7B;84.0%)和Granite Guardian 3.3(8B;82.2%)相媲美,同時使用的參數約為其一半。在整個基準測試中,它達到了75.7%的BAcc,與更大的通用LLMs如GPT-4o(75.9%)相當。我們將在論文被接受後,根據Apache 2.0許可發布HalluGuard及其數據集。
English
Large Language Models (LLMs) excel in many NLP tasks but remain prone to hallucinations, limiting trust in real-world applications. We present HalluGuard, a 4B-parameter Small Reasoning Model (SRM) for mitigating hallucinations in Retrieval-Augmented Generation (RAG). HalluGuard classifies document-claim pairs as grounded or hallucinated and produces evidence-grounded justifications for transparency. Our approach combines (i) a domain-agnostic synthetic dataset derived from FineWeb and refined through multi-stage curation and data reformation, (ii) synthetic grounded and hallucinated claims, and (iii) preference-based fine-tuning with Odds Ratio Preference Optimization to distill large-model reasoning into a smaller backbone. On the RAGTruth subset of the LLM-AggreFact benchmark, HalluGuard achieves 84.0% balanced accuracy (BAcc), rivaling specialized models, MiniCheck (7B; 84.0%) and Granite Guardian 3.3 (8B; 82.2%) while using roughly half their parameters. Over the full benchmark it reaches 75.7% BAcc, matching larger general-purpose LLMs such as GPT-4o (75.9%). We will release HalluGuard and datasets under Apache 2.0 upon acceptance.
PDF02October 8, 2025