高效推理與平衡思維
Efficient Reasoning with Balanced Thinking
March 12, 2026
作者: Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian
cs.AI
摘要
大型推理模型(LRMs)已展現出卓越的推理能力,但其常存在過度思考(在簡單問題上耗費冗餘計算步驟)與思考不足(未能充分探索潛在推理路徑)的問題。這些缺陷導致效率低下與潛在誤差,限制了在資源受限環境中的實際部署。現有緩解過度思考的方法(如抑制反思關鍵詞或調整推理長度)可能意外引發思考不足,影響準確性。為此,我們提出ReBalance——一種無需訓練的框架,通過平衡思考實現高效推理。該框架以置信度作為推理動態的連續指標,通過高方差識別過度思考,並通過持續過度自信檢測思考不足。通過將小規模數據集的隱藏狀態聚合為推理模式原型,我們計算引導向量來調整LRMs的推理軌跡。動態控制函數根據實時置信度調節該向量的強度與方向,在過度思考時剪除冗餘,在思考不足時促進探索。我們在0.5B至32B的四個模型上,針對數學推理、通用問答及代碼生成等九個基準任務進行廣泛實驗,結果表明ReBalance能有效降低輸出冗餘並提升準確率,為LRMs的高效魯棒部署提供了一種通用、免訓練且即插即用的策略。代碼已開源於:https://github.com/yu-lin-li/ReBalance。
English
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .