推論モデルは幻覚を起こしやすいのか？

要旨

近年進化した大規模推論モデル（LRM）は、長い連鎖思考（CoT）推論能力を用いて複雑なタスクを解決する強力な性能を示している。これらのLRMは主に形式的推論タスクに対する事後学習によって開発されているが、その推論能力が事実探索タスクにおける幻覚（hallucination）を軽減するために一般化されるかどうかは不明瞭であり、議論の的となっている。例えば、DeepSeek-R1は事実探索ベンチマークであるSimpleQAでの性能向上を報告している一方で、OpenAI-o3はさらに深刻な幻覚を観察している。この不一致は自然に次の研究課題を提起する：推論モデルは幻覚を起こしやすいのか？本論文はこの課題を3つの視点から取り組む。(1) まず、LRMにおける幻覚を包括的に評価する。我々の分析によると、コールドスタートの教師ありファインチューニング（SFT）と検証可能な報酬RLを伴う完全な事後学習パイプラインを経たLRMは、幻覚を軽減する。一方で、蒸留のみの手法やコールドスタートファインチューニングなしのRLトレーニングは、より微妙な幻覚を引き起こす。(2) 異なる事後学習パイプラインがLRMの幻覚に与える影響を探るため、行動分析を行う。我々はLRMの事実性に直接影響を与える2つの重要な認知行動を特定する：Flaw Repetition（表面レベルの推論試行が同じ根本的な欠陥のある論理を繰り返す）とThink-Answer Mismatch（最終的な回答が以前のCoTプロセスに忠実に一致しない）。(3) さらに、モデルの不確実性の観点からLRMの幻覚のメカニズムを調査する。我々は、LRMの幻覚の増加が通常、モデルの不確実性と事実の正確性の間の不一致と関連していることを発見する。本研究はLRMにおける幻覚の初期理解を提供する。

English

Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability. As these LRMs are mostly developed by post-training on formal reasoning tasks, whether they generalize the reasoning capability to help reduce hallucination in fact-seeking tasks remains unclear and debated. For instance, DeepSeek-R1 reports increased performance on SimpleQA, a fact-seeking benchmark, while OpenAI-o3 observes even severer hallucination. This discrepancy naturally raises the following research question: Are reasoning models more prone to hallucination? This paper addresses the question from three perspectives. (1) We first conduct a holistic evaluation for the hallucination in LRMs. Our analysis reveals that LRMs undergo a full post-training pipeline with cold start supervised fine-tuning (SFT) and verifiable reward RL generally alleviate their hallucination. In contrast, both distillation alone and RL training without cold start fine-tuning introduce more nuanced hallucinations. (2) To explore why different post-training pipelines alters the impact on hallucination in LRMs, we conduct behavior analysis. We characterize two critical cognitive behaviors that directly affect the factuality of a LRM: Flaw Repetition, where the surface-level reasoning attempts repeatedly follow the same underlying flawed logic, and Think-Answer Mismatch, where the final answer fails to faithfully match the previous CoT process. (3) Further, we investigate the mechanism behind the hallucination of LRMs from the perspective of model uncertainty. We find that increased hallucination of LRMs is usually associated with the misalignment between model uncertainty and factual accuracy. Our work provides an initial understanding of the hallucination in LRMs.

推論モデルは幻覚を起こしやすいのか？

Are Reasoning Models More Prone to Hallucination?

要旨

Support