Les modèles de raisonnement sont-ils plus sujets aux hallucinations ?

papers.abstract

Les modèles de raisonnement de grande taille (LRMs) récemment développés démontrent des performances puissantes dans la résolution de tâches complexes grâce à leur capacité de raisonnement en chaîne de pensée (CoT). Ces LRMs étant principalement développés par un post-entraînement sur des tâches de raisonnement formel, il reste incertain et débattu s'ils généralisent cette capacité de raisonnement pour réduire les hallucinations dans les tâches de recherche de faits. Par exemple, DeepSeek-R1 rapporte une amélioration des performances sur SimpleQA, un benchmark de recherche de faits, tandis qu'OpenAI-o3 observe une hallucination encore plus prononcée. Cette divergence soulève naturellement la question de recherche suivante : Les modèles de raisonnement sont-ils plus sujets aux hallucinations ? Cet article aborde cette question sous trois angles. (1) Nous menons d'abord une évaluation holistique des hallucinations dans les LRMs. Notre analyse révèle que les LRMs soumis à un pipeline complet de post-entraînement avec un affinage supervisé à froid (SFT) et un apprentissage par renforcement avec récompense vérifiable atténuent généralement leurs hallucinations. En revanche, la distillation seule et l'entraînement par renforcement sans affinage à froid introduisent des hallucinations plus subtiles. (2) Pour explorer pourquoi différents pipelines de post-entraînement modifient l'impact sur les hallucinations dans les LRMs, nous réalisons une analyse comportementale. Nous caractérisons deux comportements cognitifs critiques qui affectent directement la factualité d'un LRM : la Répétition de Défauts, où les tentatives de raisonnement superficiel suivent de manière répétée la même logique sous-jacente erronée, et le Décalage Pensée-Réponse, où la réponse finale ne correspond pas fidèlement au processus de CoT précédent. (3) Enfin, nous investiguons le mécanisme derrière les hallucinations des LRMs sous l'angle de l'incertitude du modèle. Nous constatons qu'une augmentation des hallucinations des LRMs est généralement associée à un désalignement entre l'incertitude du modèle et la précision factuelle. Notre travail fournit une compréhension initiale des hallucinations dans les LRMs.

English

Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability. As these LRMs are mostly developed by post-training on formal reasoning tasks, whether they generalize the reasoning capability to help reduce hallucination in fact-seeking tasks remains unclear and debated. For instance, DeepSeek-R1 reports increased performance on SimpleQA, a fact-seeking benchmark, while OpenAI-o3 observes even severer hallucination. This discrepancy naturally raises the following research question: Are reasoning models more prone to hallucination? This paper addresses the question from three perspectives. (1) We first conduct a holistic evaluation for the hallucination in LRMs. Our analysis reveals that LRMs undergo a full post-training pipeline with cold start supervised fine-tuning (SFT) and verifiable reward RL generally alleviate their hallucination. In contrast, both distillation alone and RL training without cold start fine-tuning introduce more nuanced hallucinations. (2) To explore why different post-training pipelines alters the impact on hallucination in LRMs, we conduct behavior analysis. We characterize two critical cognitive behaviors that directly affect the factuality of a LRM: Flaw Repetition, where the surface-level reasoning attempts repeatedly follow the same underlying flawed logic, and Think-Answer Mismatch, where the final answer fails to faithfully match the previous CoT process. (3) Further, we investigate the mechanism behind the hallucination of LRMs from the perspective of model uncertainty. We find that increased hallucination of LRMs is usually associated with the misalignment between model uncertainty and factual accuracy. Our work provides an initial understanding of the hallucination in LRMs.

Les modèles de raisonnement sont-ils plus sujets aux hallucinations ?

Are Reasoning Models More Prone to Hallucination?

papers.abstract

Support