ChatPaper.aiChatPaper

FaithLens:忠实性幻觉的检测与解释

FaithLens: Detecting and Explaining Faithfulness Hallucination

December 23, 2025
作者: Shuzheng Si, Qingyi Wang, Haozhe Zhao, Yuzhuo Bai, Guanqiao Chen, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun
cs.AI

摘要

准确识别大语言模型输出是否包含事实性幻觉对于实际应用至关重要,例如检索增强生成和文本摘要。本文提出FaithLens——一种兼具成本效益与效能的事实性幻觉检测模型,可同步提供二元判定结果及相应解释以增强可信度。为实现这一目标,我们首先通过先进大语言模型合成包含解释的训练数据,并采用严格的数据过滤策略确保标签准确性、解释质量与数据多样性。随后基于这些精心构建的训练数据对模型进行冷启动微调,并通过基于规则的强化学习进一步优化,同时考量预测准确性和解释质量的奖励信号。在12项多样化任务上的实验表明,仅80亿参数的FaithLens在性能上超越了GPT-4.1和o3等先进模型。同时,该模型能生成高质量解释,在可信度、效率与效能三者间实现了卓越平衡。
English
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.
PDF51December 25, 2025