推理模型是否更容易产生幻觉？

摘要

近期发展的大型推理模型（LRMs）在解决复杂任务时展现出强大的性能，尤其具备长链思维（CoT）推理能力。由于这些LRM大多通过对正式推理任务进行后训练而开发，它们是否能够将推理能力泛化以帮助减少事实寻求任务中的幻觉现象，仍不明确且存在争议。例如，DeepSeek-R1报告在事实寻求基准SimpleQA上性能提升，而OpenAI-o3则观察到更严重的幻觉。这种差异自然引出了以下研究问题：推理模型是否更容易产生幻觉？本文从三个角度探讨了这一问题。（1）我们首先对LRM中的幻觉进行了全面评估。分析表明，采用完整后训练流程（包括冷启动监督微调（SFT）和可验证奖励强化学习（RL））的LRM通常能减轻其幻觉。相比之下，仅使用蒸馏或未进行冷启动微调的RL训练会引入更细微的幻觉。（2）为了探究不同后训练流程如何改变LRM中幻觉的影响，我们进行了行为分析。我们刻画了直接影响LRM事实性的两种关键认知行为：缺陷重复，即表面推理尝试反复遵循相同的潜在错误逻辑；以及思维-答案不匹配，即最终答案未能忠实反映先前的CoT过程。（3）进一步，我们从模型不确定性的角度研究了LRM幻觉背后的机制。我们发现，LRM幻觉的增加通常与模型不确定性和事实准确性之间的错位有关。我们的工作为理解LRM中的幻觉提供了初步见解。

English

Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability. As these LRMs are mostly developed by post-training on formal reasoning tasks, whether they generalize the reasoning capability to help reduce hallucination in fact-seeking tasks remains unclear and debated. For instance, DeepSeek-R1 reports increased performance on SimpleQA, a fact-seeking benchmark, while OpenAI-o3 observes even severer hallucination. This discrepancy naturally raises the following research question: Are reasoning models more prone to hallucination? This paper addresses the question from three perspectives. (1) We first conduct a holistic evaluation for the hallucination in LRMs. Our analysis reveals that LRMs undergo a full post-training pipeline with cold start supervised fine-tuning (SFT) and verifiable reward RL generally alleviate their hallucination. In contrast, both distillation alone and RL training without cold start fine-tuning introduce more nuanced hallucinations. (2) To explore why different post-training pipelines alters the impact on hallucination in LRMs, we conduct behavior analysis. We characterize two critical cognitive behaviors that directly affect the factuality of a LRM: Flaw Repetition, where the surface-level reasoning attempts repeatedly follow the same underlying flawed logic, and Think-Answer Mismatch, where the final answer fails to faithfully match the previous CoT process. (3) Further, we investigate the mechanism behind the hallucination of LRMs from the perspective of model uncertainty. We find that increased hallucination of LRMs is usually associated with the misalignment between model uncertainty and factual accuracy. Our work provides an initial understanding of the hallucination in LRMs.

推理模型是否更容易产生幻觉？

Are Reasoning Models More Prone to Hallucination?

摘要

Support