MUR：基於動量不確定性引導的大型語言模型推理

摘要

大型語言模型（LLMs）在推理密集型任務上已展現出令人印象深刻的性能，然而優化其推理效率仍是一個開放性挑戰。雖然測試時擴展（TTS）提升了推理質量，但它往往導致過度思考，浪費了過多計算資源。本研究探討如何在不進行額外訓練的情況下，高效且自適應地引導LLM的測試時擴展。受物理學中動量概念的啟發，我們提出了動量不確定性引導推理（MUR），該方法通過追蹤並聚合逐步的不確定性，動態地將思考預算分配給關鍵推理步驟。為了支持靈活的推理時控制，我們引入了gamma控制，這是一種通過單一超參數調節推理預算的簡單機制。我們提供了深入的理論證明，以支持MUR在穩定性和偏差方面的優越性。MUR在多個具有挑戰性的基準測試（MATH-500、AIME24、AIME25和GPQA-diamond）上，使用不同規模的最新Qwen3模型（1.7B、4B和8B）進行了全面評估。結果表明，MUR平均減少了超過50%的計算量，同時將準確率提升了0.62-3.37%。

English

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.

MUR：基於動量不確定性引導的大型語言模型推理

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

摘要

Support