重探大语言模型推理中的均匀信息密度假说轨迹

摘要

均匀信息密度（Uniform Information Density, UID）假说认为，有效的沟通应保持信息流的稳定性。在本研究中，我们重新审视了这一原则在大语言模型（LLM）推理轨迹中的应用，探讨步骤层面的均匀性是否反映了推理质量。为此，我们提出了一种基于熵的逐步信息密度度量方法，并引入了两种互补的均匀性衡量标准：局部与全局均匀性评分。通过对六个不同推理基准的实验，我们发现步骤层面的均匀性不仅提供了强有力的理论视角，还带来了实际的性能提升；例如，在AIME2025基准上，选择步骤层面信息密度更均匀的推理轨迹，相较于基线，准确率提升了10-32%。进一步分析显示，正确的推理轨迹倾向于避免信息密度的急剧波动，而错误的轨迹则表现出不规则的信息爆发。这些结果表明，受UID启发的信息密度度量在预测推理质量方面优于其他内部信号。研究结果强调了信息密度均匀性作为构建更可靠、准确推理系统的稳健诊断与选择标准的重要性。

English

The Uniform Information Density (UID) hypothesis suggests that effective communication maintains a stable flow of information. In this work, we revisit this principle in the context of large language model (LLM) reasoning traces, asking whether step-level uniformity reflects reasoning quality. To this end, we propose an entropy-based stepwise information density metric and introduce two complementary measures of uniformity, local and global uniformity scores. Across the experiments on six different reasoning benchmarks, we find that step-level uniformity not only provides a strong theoretical lens but also yields practical performance benefits; for example, selecting reasoning traces with more uniform information density at the step-level improves accuracy by 10-32\% relative gains over baselines at AIME2025. Our analysis further reveals that correct reasoning traces tend to avoid sharp information density spikes, while incorrect traces exhibit irregular information bursts. These results demonstrate that UID-inspired information density measures outperform alternative internal signals as predictors of reasoning quality. Results highlight the uniformity of the information density as a robust diagnostic and selection criterion for building more reliable and accurate reasoning systems.

重探大语言模型推理中的均匀信息密度假说轨迹

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces

摘要

Support