ChatPaper.aiChatPaper

HalluGuard:基于证据的小型推理模型,用于缓解检索增强生成中的幻觉问题

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

October 1, 2025
作者: Loris Bergeron, Ioana Buhnila, Jérôme François, Radu State
cs.AI

摘要

大型语言模型(LLMs)在众多自然语言处理任务中表现出色,但仍易产生幻觉,这限制了其在现实应用中的可信度。我们推出了HalluGuard,一个拥有40亿参数的小型推理模型(SRM),旨在缓解检索增强生成(RAG)中的幻觉问题。HalluGuard能够将文档-声明对分类为有据可依或幻觉生成,并生成基于证据的合理解释以增强透明度。我们的方法结合了:(i)从FineWeb衍生并通过多阶段筛选与数据重构优化的领域无关合成数据集,(ii)合成的有据可依与幻觉声明,以及(iii)采用几率比偏好优化进行偏好微调,将大模型的推理能力蒸馏至更小的骨干网络中。在LLM-AggreFact基准测试的RAGTruth子集上,HalluGuard实现了84.0%的平衡准确率(BAcc),与专用模型MiniCheck(70亿参数;84.0%)和Granite Guardian 3.3(80亿参数;82.2%)旗鼓相当,而参数数量仅为它们的一半左右。在整个基准测试中,其平衡准确率达到75.7%,与GPT-4o等更大规模的通用LLMs(75.9%)不相上下。我们将在论文被接受后,依据Apache 2.0协议发布HalluGuard及其相关数据集。
English
Large Language Models (LLMs) excel in many NLP tasks but remain prone to hallucinations, limiting trust in real-world applications. We present HalluGuard, a 4B-parameter Small Reasoning Model (SRM) for mitigating hallucinations in Retrieval-Augmented Generation (RAG). HalluGuard classifies document-claim pairs as grounded or hallucinated and produces evidence-grounded justifications for transparency. Our approach combines (i) a domain-agnostic synthetic dataset derived from FineWeb and refined through multi-stage curation and data reformation, (ii) synthetic grounded and hallucinated claims, and (iii) preference-based fine-tuning with Odds Ratio Preference Optimization to distill large-model reasoning into a smaller backbone. On the RAGTruth subset of the LLM-AggreFact benchmark, HalluGuard achieves 84.0% balanced accuracy (BAcc), rivaling specialized models, MiniCheck (7B; 84.0%) and Granite Guardian 3.3 (8B; 82.2%) while using roughly half their parameters. Over the full benchmark it reaches 75.7% BAcc, matching larger general-purpose LLMs such as GPT-4o (75.9%). We will release HalluGuard and datasets under Apache 2.0 upon acceptance.
PDF02October 8, 2025