ChatPaper.aiChatPaper

少即是多:通過最小化測試時干預提升大語言模型的推理能力

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

October 15, 2025
作者: Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Pengfei Wan, Ying-Cong Chen
cs.AI

摘要

近期大型語言模型(LLMs)的進展主要集中在測試階段的擴展,通過增加推理計算來提升推理能力,但這往往以效率為代價。我們重新審視了測試階段的行為,並發現了一個簡單卻未被充分探索的現象:推理的不確定性具有高度局部性——僅有一小部分高熵的token主導影響輸出的正確性。基於此,我們提出了最小測試階段干預(MTI),這是一個無需訓練的框架,能夠以最小的開銷提升推理的準確性和穩定性。MTI包括:(i)選擇性CFG干預,僅在不確定位置應用無分類器指導;以及(ii)輕量級負提示指導,重用主模型的KV緩存來高效地近似無條件解碼。MTI在通用、編程和STEM任務中均取得了穩定的增益——例如,Qwen3-8B-Base在八個基準測試中平均提升了1.35%,而使用Qwen3-32B-Reasoning在AIME2024上提升了5%——同時保持了極高的效率。
English
Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +1.35% average improvement on eight benchmarks for Qwen3-8B-Base and +5% on AIME2024 using Qwen3-32B-Reasoning-while remaining highly efficient.
PDF61October 17, 2025