Het Uniforme Informatiedichtheids Hypothese opnieuw bekijken in LLM-redeneersporen

Samenvatting

De Uniform Information Density (UID)-hypothese suggereert dat effectieve communicatie een stabiele informatiestroom handhaaft. In dit werk herzien we dit principe in de context van reasoning traces van grote taalmodellen (LLM's), waarbij we onderzoeken of stapniveau-uniformiteit de kwaliteit van redenering weerspiegelt. Hiertoe stellen we een entropie-gebaseerde, stapsgewijze informatie-dichtheidsmetriek voor en introduceren we twee complementaire uniformiteitsmaten: lokale en globale uniformiteitsscores. Uit experimenten op zes verschillende reasoning benchmarks blijkt dat stapniveau-uniformiteit niet alleen een sterk theoretisch perspectief biedt, maar ook praktische prestatievoordelen oplevert; bijvoorbeeld, het selecteren van reasoning traces met een meer uniforme informatie-dichtheid op stapniveau verbetert de nauwkeurigheid met relatieve winsten van 10-32\% ten opzichte van de baseline bij AIME2025. Onze analyse toont verder aan dat correcte reasoning traces de neiging hebben om scherpe pieken in informatie-dichtheid te vermijden, terwijl incorrecte traces onregelmatige informatie-uitbarstingen vertonen. Deze resultaten tonen aan dat UID-geïnspireerde informatie-dichtheidsmaten alternatieve interne signalen overtreffen als voorspellers van redeneerkwaliteit. De resultaten benadrukken de uniformiteit van de informatie-dichtheid als een robuust diagnostisch en selectiecriterium voor het bouwen van betrouwbaardere en nauwkeurigere redeneersystemen.

English

The Uniform Information Density (UID) hypothesis suggests that effective communication maintains a stable flow of information. In this work, we revisit this principle in the context of large language model (LLM) reasoning traces, asking whether step-level uniformity reflects reasoning quality. To this end, we propose an entropy-based stepwise information density metric and introduce two complementary measures of uniformity, local and global uniformity scores. Across the experiments on six different reasoning benchmarks, we find that step-level uniformity not only provides a strong theoretical lens but also yields practical performance benefits; for example, selecting reasoning traces with more uniform information density at the step-level improves accuracy by 10-32\% relative gains over baselines at AIME2025. Our analysis further reveals that correct reasoning traces tend to avoid sharp information density spikes, while incorrect traces exhibit irregular information bursts. These results demonstrate that UID-inspired information density measures outperform alternative internal signals as predictors of reasoning quality. Results highlight the uniformity of the information density as a robust diagnostic and selection criterion for building more reliable and accurate reasoning systems.

Het Uniforme Informatiedichtheids Hypothese opnieuw bekijken in LLM-redeneersporen

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces

Samenvatting

Support