推理模型是否更易產生幻覺？

摘要

近期发展的大型推理模型（LRMs）在解决复杂任务时展现了强大的性能，尤其是在具备长链思维（CoT）推理能力方面。由于这些LRMs大多通过在正式推理任务上进行后训练而开发，它们是否能够将推理能力泛化以帮助减少事实寻求任务中的幻觉现象，仍不明确且存在争议。例如，DeepSeek-R1报告在SimpleQA这一事实寻求基准测试上性能提升，而OpenAI-o3则观察到更为严重的幻觉现象。这种差异自然引出了以下研究问题：推理模型是否更容易产生幻觉？本文从三个角度探讨这一问题。（1）我们首先对LRMs中的幻觉现象进行全面评估。分析表明，经过完整后训练流程的LRMs，包括冷启动监督微调（SFT）和可验证奖励强化学习（RL），通常能减轻其幻觉现象。相比之下，仅进行蒸馏或未经冷启动微调的RL训练则引入了更为细微的幻觉。（2）为了探究不同后训练流程为何会改变LRMs中幻觉的影响，我们进行了行为分析。我们刻画了直接影响LRM事实性的两种关键认知行为：缺陷重复，即表面推理尝试反复遵循相同的底层错误逻辑；以及思维-答案不匹配，即最终答案未能忠实反映先前的CoT过程。（3）进一步，我们从模型不确定性的角度探讨了LRMs产生幻觉的机制。我们发现，LRMs幻觉的增加通常与模型不确定性和事实准确性之间的错位有关。我们的工作为理解LRMs中的幻觉现象提供了初步的认识。

English

Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability. As these LRMs are mostly developed by post-training on formal reasoning tasks, whether they generalize the reasoning capability to help reduce hallucination in fact-seeking tasks remains unclear and debated. For instance, DeepSeek-R1 reports increased performance on SimpleQA, a fact-seeking benchmark, while OpenAI-o3 observes even severer hallucination. This discrepancy naturally raises the following research question: Are reasoning models more prone to hallucination? This paper addresses the question from three perspectives. (1) We first conduct a holistic evaluation for the hallucination in LRMs. Our analysis reveals that LRMs undergo a full post-training pipeline with cold start supervised fine-tuning (SFT) and verifiable reward RL generally alleviate their hallucination. In contrast, both distillation alone and RL training without cold start fine-tuning introduce more nuanced hallucinations. (2) To explore why different post-training pipelines alters the impact on hallucination in LRMs, we conduct behavior analysis. We characterize two critical cognitive behaviors that directly affect the factuality of a LRM: Flaw Repetition, where the surface-level reasoning attempts repeatedly follow the same underlying flawed logic, and Think-Answer Mismatch, where the final answer fails to faithfully match the previous CoT process. (3) Further, we investigate the mechanism behind the hallucination of LRMs from the perspective of model uncertainty. We find that increased hallucination of LRMs is usually associated with the misalignment between model uncertainty and factual accuracy. Our work provides an initial understanding of the hallucination in LRMs.

推理模型是否更易產生幻覺？

Are Reasoning Models More Prone to Hallucination?

摘要

Support