學習推理以檢測幻覺跨度

摘要

大型語言模型（LLMs）常產生幻覺——即缺乏支持的內容，這削弱了其可靠性。雖然多數先前研究將幻覺檢測視為二元任務，但許多實際應用需要識別幻覺片段，這是一個多步驟的決策過程。這自然引發了一個問題：顯式推理是否能協助完成檢測幻覺片段這一複雜任務。為解答此問題，我們首先評估了帶有與不帶有思維鏈（CoT）推理的預訓練模型，並顯示CoT推理在多次採樣時有潛力生成至少一個正確答案。受此啟發，我們提出了RL4HS，這是一個強化學習框架，通過片段級獎勵函數激勵推理。RL4HS基於群組相對策略優化，並引入了類別感知策略優化以緩解獎勵不平衡問題。在RAGTruth基準（摘要生成、問答、數據到文本）上的實驗表明，RL4HS超越了預訓練的推理模型和有監督的微調，證明了使用片段級獎勵進行強化學習對於檢測幻覺片段的必要性。

English

Large language models (LLMs) often generate hallucinations -- unsupported content that undermines reliability. While most prior works frame hallucination detection as a binary task, many real-world applications require identifying hallucinated spans, which is a multi-step decision making process. This naturally raises the question of whether explicit reasoning can help the complex task of detecting hallucination spans. To answer this question, we first evaluate pretrained models with and without Chain-of-Thought (CoT) reasoning, and show that CoT reasoning has the potential to generate at least one correct answer when sampled multiple times. Motivated by this, we propose RL4HS, a reinforcement learning framework that incentivizes reasoning with a span-level reward function. RL4HS builds on Group Relative Policy Optimization and introduces Class-Aware Policy Optimization to mitigate reward imbalance issue. Experiments on the RAGTruth benchmark (summarization, question answering, data-to-text) show that RL4HS surpasses pretrained reasoning models and supervised fine-tuning, demonstrating the necessity of reinforcement learning with span-level rewards for detecting hallucination spans.

學習推理以檢測幻覺跨度

Learning to Reason for Hallucination Span Detection

摘要

Support