学习推理以检测幻觉跨度
Learning to Reason for Hallucination Span Detection
October 2, 2025
作者: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh, Cem Koc, Joseph Yitan Cheng, Oncel Tuzel, Raviteja Vemulapalli
cs.AI
摘要
大型语言模型(LLMs)常产生幻觉——即缺乏依据的内容,这削弱了其可靠性。尽管多数先前研究将幻觉检测视为二元任务,但许多实际应用需要识别幻觉片段,这是一个多步骤的决策过程。这自然引发了一个问题:显式推理是否有助于完成检测幻觉片段这一复杂任务。为解答此问题,我们首先评估了带有与不带有链式思维(CoT)推理的预训练模型,结果表明,CoT推理在多次采样时具备生成至少一个正确答案的潜力。受此启发,我们提出了RL4HS,一个通过片段级奖励函数激励推理的强化学习框架。RL4HS基于群体相对策略优化,并引入了类感知策略优化以缓解奖励不平衡问题。在RAGTruth基准测试(摘要生成、问答、数据到文本转换)上的实验显示,RL4HS超越了预训练推理模型及监督微调,证明了采用片段级奖励的强化学习对于检测幻觉片段的必要性。
English
Large language models (LLMs) often generate hallucinations -- unsupported
content that undermines reliability. While most prior works frame hallucination
detection as a binary task, many real-world applications require identifying
hallucinated spans, which is a multi-step decision making process. This
naturally raises the question of whether explicit reasoning can help the
complex task of detecting hallucination spans. To answer this question, we
first evaluate pretrained models with and without Chain-of-Thought (CoT)
reasoning, and show that CoT reasoning has the potential to generate at least
one correct answer when sampled multiple times. Motivated by this, we propose
RL4HS, a reinforcement learning framework that incentivizes reasoning with a
span-level reward function. RL4HS builds on Group Relative Policy Optimization
and introduces Class-Aware Policy Optimization to mitigate reward imbalance
issue. Experiments on the RAGTruth benchmark (summarization, question
answering, data-to-text) show that RL4HS surpasses pretrained reasoning models
and supervised fine-tuning, demonstrating the necessity of reinforcement
learning with span-level rewards for detecting hallucination spans.