不確実性下における戦略的情報配分を通じた大規模言語モデルの推論理解

要旨

大規模言語モデル（LLM）は、推論過程において「Wait」のようなトークンに続く明らかな自己修正など、「ひらめき」の瞬間を示すことが多いが、その基盤となるメカニズムは未解明である。本論文では、推論を手続き的情報と認識的言語化（不確実性を明示的に外在化し、下流の制御行動を支援するプロセス）に分解する情報理論的枠組みを提案する。純粋に手続き的な推論は情報的に停滞しうるのに対し、認識的言語化は継続的な情報獲得を可能にし、情報の十分性達成に重要であることを示す。実証結果から、強力な推論性能は特定の表層トークンではなく、不確実性の外在化によって駆動されることが明らかとなった。本枠組みは、ひらめきの瞬間に関する既存の知見と学習後実験とを統合し、将来の推論モデル設計への示唆を提供する。

English

LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

不確実性下における戦略的情報配分を通じた大規模言語モデルの推論理解

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

要旨

Support