不确定条件下战略信息分配视角下的大型语言模型推理机制探析

摘要

大型语言模型在推理过程中常表现出"顿悟时刻"，例如在出现"等等"类标记后看似自我修正的现象，但其内在机制尚不明确。我们提出一种信息论框架，将推理分解为程序性信息与认知外化——即支撑下游控制行为的不确定性显性表达。研究表明，纯粹程序性推理会导致信息停滞，而认知外化能持续获取信息，对实现信息充分性至关重要。实证结果证实，强劲的推理性能源自不确定性外化过程，而非特定表层标记。该框架统一了关于顿悟时刻与训练后实验的既有发现，为未来推理模型设计提供了新视角。

English

LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

不确定条件下战略信息分配视角下的大型语言模型推理机制探析

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

摘要

Support