高效推理与平衡思维

摘要

大型推理模型（LRMs）已展现出卓越的推理能力，但常存在过度思考（在简单问题上消耗冗余计算步骤）或思考不足（未能充分利用自身能力探索充分推理路径）的问题。这些缺陷导致效率低下和潜在错误，限制了在资源受限场景中的实际部署。现有缓解过度思考的方法（如抑制反思关键词或调整推理长度）可能意外引发思考不足，从而影响准确性。为此，我们提出ReBalance——一种无需训练即可实现均衡思考的高效推理框架。该框架将置信度作为推理动态的连续指标，通过高方差识别过度思考，通过持续过度自信检测思考不足。通过将小规模数据集的隐藏状态聚合为推理模式原型，我们计算导向向量以引导LRMs的推理轨迹。动态控制函数根据实时置信度调节该向量的强度与方向，在过度思考时剪枝冗余，在思考不足时促进探索。我们在从0.5B到32B的四种模型上进行了广泛实验，覆盖数学推理、通用问答及代码生成等九项基准测试。结果表明，ReBalance在提升准确率的同时有效减少了输出冗余，为LRMs的高效稳健部署提供了一种通用、免训练、即插即用的解决方案。代码已开源：https://github.com/yu-lin-li/ReBalance。

English

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .