MUR: 大規模言語モデルのためのモーメンタム不確実性誘導推論

要旨

大規模言語モデル（LLM）は、推論を要するタスクにおいて印象的な性能を達成していますが、その推論効率の最適化は未解決の課題です。テストタイムスケーリング（TTS）は推論品質を向上させますが、しばしば過剰思考を引き起こし、冗長な計算にトークンを浪費します。本研究では、追加の学習なしにLLMのテストタイムスケーリングを効率的かつ適応的に導く方法を探ります。物理学における運動量の概念に着想を得て、ステップごとの不確実性を追跡・集約することで、重要な推論ステップに思考予算を動的に割り当てるMomentum Uncertainty-guided Reasoning（MUR）を提案します。柔軟な推論時制御をサポートするため、単一のハイパーパラメータで推論予算を調整するgamma-controlというシンプルなメカニズムを導入します。MURの安定性とバイアスに関する優位性を裏付ける詳細な理論的証明を提供します。MURは、最近のQwen3モデル（1.7B、4B、8B）を用いて、4つの挑戦的なベンチマーク（MATH-500、AIME24、AIME25、GPQA-diamond）で様々なTTS手法と包括的に比較評価されました。結果は、MURが平均で50%以上の計算量を削減しつつ、精度を0.62-3.37%向上させることを示しています。

English

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.

MUR: 大規模言語モデルのためのモーメンタム不確実性誘導推論

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

要旨

Support