潜在的なステアリングベクトルを用いた分数推論により推論時間の計算が改善される

要旨

テスト時計算は、大規模言語モデル（LLM）の性能を向上させる強力なパラダイムとして登場し、複数の出力を生成したり個々の推論連鎖を洗練させたりすることで回答精度を大幅に向上させることができます。しかし、Best-of-N、多数決、自己反映などの既存の手法は、通常、入力全体にわたって均一に推論を適用しており、異なる問題が異なるレベルの推論深度を必要とするという事実を見落としています。本研究では、Fractional Reasoning（分数推論）を提案します。これは、推論時に推論強度を連続的に制御できる、訓練不要でモデルに依存しないフレームワークであり、固定された指示プロンプトの限界を超えるものです。私たちの手法は、より深い推論に関連する潜在的なステアリングベクトルを抽出し、調整可能なスケーリング係数で再適用することで動作し、モデルが各入力の複雑さに応じて推論プロセスを調整できるようにします。これにより、テスト時のスケーリングにおける2つの主要なモードがサポートされます：（1）幅ベースの戦略（例：Best-of-N、多数決）における出力品質の向上、（2）深さベースの戦略（例：自己反映）における個々の推論連鎖の正確性の向上です。GSM8K、MATH500、GPQAでの実験により、Fractional Reasoningが多様な推論タスクとモデルにわたって一貫して性能を向上させることが実証されています。

English

Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.

潜在的なステアリングベクトルを用いた分数推論により推論時間の計算が改善される

Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

要旨

Support