Echo's als Ankers: Probabilistische Kosten en Aandachtsherfocus in LLM-redenering

Samenvatting

Test-tijd compute-toewijzing in grote redeneermodellen (LRM's) wordt veelvuldig toegepast en kent toepassingen in wiskundig probleemoplossen, codesynthese en planning. Recent werk heeft dit probleem aangepakt door schaalvergroting van zelfconsistentie en parallel denken, door het toevoegen van generieke "denktokens" en door modellen aan te sporen de vraag opnieuw te lezen voordat ze antwoorden. Helaas injecteren deze benaderingen ofwel taakonafhankelijke tokens, of leggen ze heuristieken op die de spontane herhaling die veel LRM's vertonen aan het begin van hun interne redeneerketens niet verklaren – en vaak negeren. Daarentegen analyseren en benutten wij de neiging van het model om de vraag te herformuleren, wat wij de Echo van de Prompt (EOP) noemen, als een vooraf ingelaste, compute-vormende mechanisme. Wij formaliseren de probabilistische kost ervan door echo-verwijdering te modelleren als conditionering op basis van verwerping en door de Echo Waarschijnlijkheidskloof ΔL te definiëren als een berekenbare proxy. Dit voorziet in de ontbrekende theoretische schakel die vroege herhaling verbindt aan waarschijnlijkheidswinst en downstream-nauwkeurigheid. Dit specificeert echter op zichzelf niet hoe EOP te exploiteren. Derhalve ontwikkelen wij Echo-Gedistilleerde SFT (ED-SFT) om een "echo-dan-redeneer"-patroon in te prenten door supervised finetuning, en Echoïsche Prompting (EP) om het model midden in de redeneerreeks opnieuw te gronden zonder training. Hoewel veelbelovend, is het kwantificeren van voordelen voorbij loutere woordrijkheid niet triviaal. Daarom voeren wij lengte- en suffix-gecontroleerde waarschijnlijkheidsanalyses uit, samen met onderzoek naar aandacht per laag, waaruit blijkt dat EOP de aandacht voor het antwoord ten opzichte van het antwoordvoorvoegsel in de middelste lagen vergroot, in overeenstemming met een mechanisme van aandacht-herafstelling. Wij evalueren op GSM8K, MathQA, Hendrycks-MATH, AIME24 en MATH-500 onder identieke decodeerinstellingen en -budgetten, en vinden consistente winsten ten opzichte van de basislijnen. Code is beschikbaar op https://github.com/hhh2210/echoes-as-anchors.

English

Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the spontaneous repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the Echo of Prompt (EOP), as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the Echo Likelihood Gap ΔL as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop Echo-Distilled SFT (ED-SFT) to instill an ``echo-then-reason'' pattern through supervised finetuning, and Echoic Prompting (EP) to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an attention refocusing mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.

Echo's als Ankers: Probabilistische Kosten en Aandachtsherfocus in LLM-redenering

Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

Samenvatting

Support