思考以回憶：推理如何釋放大型語言模型中的參數化知識

摘要

雖然大型語言模型中的推理能力在數學運算、程式碼生成及多跳躍事實問答中自然發揮作用，但其對簡單的單跳躍事實問答的影響仍不明確。此類問題無需逐步邏輯分解，使得推理的效用顯得極不符合直覺。然而，我們發現啟用推理能大幅擴展模型參數化知識回憶的能力邊界，從而解鎖那些原本無法觸及的正確答案。當問題無需複雜推理步驟時，為何推理仍有助於參數化知識回憶？為解答此問題，我們設計了一系列假設驅動的對照實驗，並識別出兩大關鍵驅動機制：（1）計算緩衝效應——模型利用生成的推理詞元執行獨立於語義內容的潛在計算；（2）事實預熱效應——生成主題相關事實可作為語義橋樑，促進正確答案的檢索。值得注意的是，後者這種生成式自我檢索機制存在固有風險：我們證實推理過程中若虛構中間事實，將增加最終答案出現幻覺的可能性。最後，我們展示如何運用這些洞見直接提升模型準確率——通過優先選擇包含無幻覺事實陳述的推理路徑。

English

While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.

思考以回憶：推理如何釋放大型語言模型中的參數化知識

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

摘要

Support