重探大型語言模型推理中的均勻信息密度假說 軌跡
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces
October 8, 2025
作者: Minju Gwak, Guijin Son, Jaehyung Kim
cs.AI
摘要
均勻信息密度(Uniform Information Density, UID)假說認為,有效的溝通應維持穩定的信息流。在本研究中,我們將這一原則重新應用於大型語言模型(LLM)的推理軌跡中,探討步驟層面的均勻性是否反映了推理質量。為此,我們提出了一種基於熵的逐步信息密度度量方法,並引入了兩種互補的均勻性測量指標:局部均勻性分數和全局均勻性分數。在對六個不同推理基準的實驗中,我們發現步驟層面的均勻性不僅提供了強大的理論視角,還帶來了實際的性能提升;例如,在AIME2025基準上,選擇步驟層面信息密度更均勻的推理軌跡,相較於基線,準確率提升了10-32%。我們的分析進一步揭示,正確的推理軌跡往往避免信息密度的急劇波動,而錯誤的軌跡則表現出不規則的信息爆發。這些結果表明,受UID啟發的信息密度度量方法在預測推理質量方面優於其他內部信號。研究結果強調了信息密度的均勻性作為構建更可靠、更準確推理系統的穩健診斷和選擇標準的重要性。
English
The Uniform Information Density (UID) hypothesis suggests that effective
communication maintains a stable flow of information. In this work, we revisit
this principle in the context of large language model (LLM) reasoning traces,
asking whether step-level uniformity reflects reasoning quality. To this end,
we propose an entropy-based stepwise information density metric and introduce
two complementary measures of uniformity, local and global uniformity scores.
Across the experiments on six different reasoning benchmarks, we find that
step-level uniformity not only provides a strong theoretical lens but also
yields practical performance benefits; for example, selecting reasoning traces
with more uniform information density at the step-level improves accuracy by
10-32\% relative gains over baselines at AIME2025. Our analysis further reveals
that correct reasoning traces tend to avoid sharp information density spikes,
while incorrect traces exhibit irregular information bursts. These results
demonstrate that UID-inspired information density measures outperform
alternative internal signals as predictors of reasoning quality. Results
highlight the uniformity of the information density as a robust diagnostic and
selection criterion for building more reliable and accurate reasoning systems.