ChatPaper.aiChatPaper

少即是多:通过最小化测试时干预提升大语言模型推理能力

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

October 15, 2025
作者: Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Pengfei Wan, Ying-Cong Chen
cs.AI

摘要

近期,大型语言模型(LLMs)的研究进展主要集中在通过增加推理计算来提升测试时的推理能力,但往往以牺牲效率为代价。我们重新审视了测试时的行为,揭示了一个简单却未被充分探索的现象:推理的不确定性高度局部化——仅有一小部分高熵的token对输出的正确性起主导作用。基于这一发现,我们提出了最小化测试时干预(MTI),这是一个无需额外训练即可提升推理准确性和稳定性的框架,且开销极小。MTI包含两个关键策略:(i)选择性CFG干预,仅在不确定的位置应用无分类器指导;(ii)轻量级负提示指导,通过重用主模型的KV缓存来高效近似无条件解码。MTI在通用任务、编程任务及STEM任务上均取得了稳定的性能提升——例如,在Qwen3-8B-Base模型上,八个基准测试平均提升了1.35%;在AIME2024测试中,使用Qwen3-32B-Reasoning模型提升了5%,同时保持了极高的效率。
English
Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +1.35% average improvement on eight benchmarks for Qwen3-8B-Base and +5% on AIME2024 using Qwen3-32B-Reasoning-while remaining highly efficient.
PDF61October 17, 2025