不确定情境下战略信息分配视角下的大型语言模型推理机制探析

摘要

大型语言模型在推理过程中常出现"顿悟时刻"，例如在"等等"类提示词后表现出明显的自我修正行为，但其内在机制尚不明确。我们提出一种信息论框架，将推理分解为程序性信息与认知外化——即支撑下游控制行为的不确定性显性表达。研究表明纯程序性推理会导致信息停滞，而认知外化能持续获取信息，对实现信息充分性至关重要。实证结果证实，卓越的推理性能源自不确定性外化过程，而非特定表层标记。该框架统一了关于顿悟时刻与训练后实验的既有发现，并为未来推理模型设计提供了新视角。

English

LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

不确定情境下战略信息分配视角下的大型语言模型推理机制探析

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

摘要

Support