ChatPaper.aiChatPaper

以謊言教學:基於合成負例的課程式DPO用於幻覺檢測

Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

May 23, 2025
作者: Shrey Pandit, Ashwin Vinod, Liu Leqi, Ying Ding
cs.AI

摘要

對齊大型語言模型(LLMs)以準確檢測幻覺文本仍然是一個重大挑戰,這歸因於幻覺文本的複雜性。考慮到幻覺樣本通常比傳統的負樣本具有更高的欺騙性質量,我們在DPO對齊過程中將這些精心設計的幻覺作為負樣本使用。我們的方法採用了課程學習策略,逐步將訓練從基於獨立事實核查模型概率分數最大降低的較易樣本過渡到逐漸更難的樣本。這種結構化的難度分級確保了穩定且漸進的學習。實驗評估表明,採用課程DPO方法和高質量負樣本訓練的HaluCheck模型在各種指標上顯著提升了模型性能,在MedHallu和HaluEval等困難基準測試中實現了高達24%的改進。此外,HaluCheck模型在零樣本設置中展現了魯棒性,在多個基準測試中顯著優於更大的最先進模型。
English
Aligning large language models (LLMs) to accurately detect hallucinations remains a significant challenge due to the sophisticated nature of hallucinated text. Recognizing that hallucinated samples typically exhibit higher deceptive quality than traditional negative samples, we use these carefully engineered hallucinations as negative examples in the DPO alignment procedure. Our method incorporates a curriculum learning strategy, gradually transitioning the training from easier samples, identified based on the greatest reduction in probability scores from independent fact checking models, to progressively harder ones. This structured difficulty scaling ensures stable and incremental learning. Experimental evaluation demonstrates that our HaluCheck models, trained with curriculum DPO approach and high quality negative samples, significantly improves model performance across various metrics, achieving improvements of upto 24% on difficult benchmarks like MedHallu and HaluEval. Additionally, HaluCheck models demonstrate robustness in zero-shot settings, significantly outperforming larger state-of-the-art models across various benchmarks.

Summary

AI-Generated Summary

PDF132May 26, 2025