FaithLens:偵測與詮釋忠實性幻覺
FaithLens: Detecting and Explaining Faithfulness Hallucination
December 23, 2025
作者: Shuzheng Si, Qingyi Wang, Haozhe Zhao, Yuzhuo Bai, Guanqiao Chen, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun
cs.AI
摘要
在實際應用中(例如檢索增強生成與文本摘要),辨識大型語言模型的輸出是否包含忠實性幻覺至關重要。本文提出FaithLens——一種兼具成本效益與高效能的忠實性幻覺檢測模型,能同步提供二元判斷與對應解釋以提升可信度。為實現此目標,我們首先透過先進大型語言模型合成含解釋的訓練數據,並採用嚴謹的數據篩選策略以確保標籤正確性、解釋品質與數據多樣性。接著以精製訓練數據對模型進行冷啟動微調,再透過基於規則的強化學習進一步優化,同時以預測準確度與解釋品質作為獎勵指標。在12項多樣化任務上的實驗結果顯示,僅80億參數的FaithLens勝過GPT-4.1及o3等先進模型,且能產出高品質解釋,在可信度、效率與效能間達成卓越平衡。
English
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.