불확실성 하에서 전략적 정보 할당을 통한 LLM 추론 이해

초록

대규모 언어 모델(LLM)은 추론 과정에서 "잠깐만"과 같은 토큰 뒤에 나타나는 자기 수정과 같은 '아하 순간'을 종종 보이지만, 그 근본적인 메커니즘은 여전히 불분명합니다. 본 연구는 추론을 절차적 정보와 인식적 언어화(하위 제어 행동을 지원하는 불확실성의 명시적 외현화)로 분해하는 정보 이론적 프레임워크를 제시합니다. 순수한 절차적 추론은 정보적으로 정체될 수 있는 반면, 인식적 언어화는 지속적인 정보 획득을 가능하게 하며 정보 충분성 달성에 중요함을 보여줍니다. 실험 결과에 따르면 강력한 추론 성능은 특정 표면 토큰보다는 불확실성 외현화에 의해 주도됩니다. 우리의 프레임워크는 아하 순간과 사후 훈련 실험에 대한 기존 연구 결과를 통합하며, 향후 추론 모델 설계에 대한 통찰을 제공합니다.

English

LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

불확실성 하에서 전략적 정보 할당을 통한 LLM 추론 이해

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

초록

Support