回声锚定:大語言模型推理中的概率成本與注意力重聚焦
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
February 6, 2026
作者: Zhuoyuan Hao, Zhuo Li, Wu Li, Fangming Liu, Min Zhang, Jing Li
cs.AI
摘要
大型推理模型(LRMs)中的测试时计算资源分配已被广泛应用,并在数学问题求解、代码合成和规划等领域发挥作用。近期研究通过扩展自我一致性与并行思维、添加通用"思考标记"以及提示模型在作答前重读问题来解决该问题。然而,这些方法要么注入与任务无关的标记,要么强制采用无法解释——且常常忽略——许多LRMs在其内部推理链开头表现出的自发重复现象的启发式策略。相比之下,我们分析并利用模型重述问题的倾向(将其称为提示回声/EOP),将其作为前置的计算资源调配机制。通过将回声消除视为基于拒绝的条件化处理,并定义可计算的代理指标"回声似然差距ΔL",我们形式化了其概率成本。这为早期重复现象与似然增益及下游准确性之间建立了缺失的理论联系。但该理论本身并未阐明如何利用EOP。因此,我们开发了通过监督微调植入"先回声后推理"模式的回声蒸馏微调(ED-SFT),以及无需训练即可在推理过程中重新锚定模型的回声提示(EP)技术。尽管前景可观,量化超越冗余性的收益仍非易事。为此我们进行了长度与后缀控制的似然分析及分层注意力研究,表明EOP能增强中间层对答案前缀的注意力分布,这与注意力重聚焦机制相一致。在GSM8K、MathQA、Hendrycks-MATH、AIME24和MATH-500数据集上采用相同解码设置与计算预算的评估显示,该方法相较基线模型获得稳定提升。代码发布于https://github.com/hhh2210/echoes-as-anchors。
English
Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the spontaneous repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the Echo of Prompt (EOP), as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the Echo Likelihood Gap ΔL as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop Echo-Distilled SFT (ED-SFT) to instill an ``echo-then-reason'' pattern through supervised finetuning, and Echoic Prompting (EP) to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an attention refocusing mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.