効率的な推論とバランスの取れた思考

要旨

大規模推論モデル（LRM）は顕著な推論能力を示す一方で、過剰思考（単純な問題に対して冗長な計算ステップを費やす）や思考不足（内在的な能力があるにもかかわらず十分な推論経路を探索できない）に悩まされることが多い。これらの問題は非効率性と潜在的な不正確さを招き、リソース制約のある環境での実用展開を制限している。既存の過剰思考緩和手法（反射的キーワードの抑制や推論長の調整など）は、意図せず思考不足を誘発し精度を損なう可能性がある。そこで我々は、バランスの取れた思考による効率的な推論を実現する訓練不要のフレームワーク「ReBalance」を提案する。ReBalanceは信頼度を推論ダイナミクスの連続的指標として活用し、高い信頼度分散から過剰思考を、一貫した過信から思考不足を識別する。小規模データセットからの隠れ状態を推論モードのプロトタイプに集約することで、LRMの推論軌道を誘導するステアリングベクトルを計算する。動的制御関数はこのベクトルの強度と方向をリアルタイム信頼度に基づいて調整し、過剰思考時には冗長性を除去し、思考不足時には探索を促進する。0.5Bから32Bまでの4モデル、数学推論・一般質問応答・コード生成の9ベンチマークで実施した大規模実験により、ReBalanceが出力の冗長性を効果的に削減し精度を向上させることを実証した。これは効率的かつ頑健なLRM展開のための、一般的で訓練不要なプラグアンドプレイ戦略を提供する。コードはhttps://github.com/yu-lin-li/ReBalance で公開されている。

English

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .

効率的な推論とバランスの取れた思考

Efficient Reasoning with Balanced Thinking

要旨

Support