思考以回忆：推理如何释放大型语言模型中的参数化知识

摘要

虽然大语言模型中的推理能力在数学计算、代码生成和多跳事实性问题中自然发挥作用，但其对简单单跳事实性问题的影响仍不明确。此类问题无需逐步逻辑分解，使得推理的效用显得有违直觉。然而我们发现，启用推理能显著扩展模型参数化知识回忆的能力边界，解锁那些原本无法触及的正确答案。当不存在复杂推理步骤时，为何推理仍有助于参数化知识回忆？为解答此问题，我们设计了一系列假设驱动的受控实验，并识别出两个关键驱动机制：（1）计算缓冲效应，即模型利用生成的推理标记执行独立于其语义内容的隐式计算；（2）事实触发效应，即生成主题相关事实可充当语义桥梁促进正确答案检索。值得注意的是，后一种生成式自检索机制存在固有风险：我们证明在推理过程中虚构中间事实会增加最终答案出现幻觉的可能性。最后，我们展示了如何通过优先选择包含无幻觉事实陈述的推理路径，将这一洞见直接应用于提升模型准确率。

English

While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.