MUR:基於動量不確定性引導的大型語言模型推理
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
July 20, 2025
作者: Hang Yan, Fangzhi Xu, Rongman Xu, Yifei Li, Jian Zhang, Haoran Luo, Xiaobao Wu, Luu Anh Tuan, Haiteng Zhao, Qika Lin, Jun Liu
cs.AI
摘要
大型語言模型(LLMs)在推理密集型任務上已展現出令人印象深刻的性能,然而優化其推理效率仍是一個開放性挑戰。雖然測試時擴展(TTS)提升了推理質量,但它往往導致過度思考,浪費了過多計算資源。本研究探討如何在不進行額外訓練的情況下,高效且自適應地引導LLM的測試時擴展。受物理學中動量概念的啟發,我們提出了動量不確定性引導推理(MUR),該方法通過追蹤並聚合逐步的不確定性,動態地將思考預算分配給關鍵推理步驟。為了支持靈活的推理時控制,我們引入了gamma控制,這是一種通過單一超參數調節推理預算的簡單機制。我們提供了深入的理論證明,以支持MUR在穩定性和偏差方面的優越性。MUR在多個具有挑戰性的基準測試(MATH-500、AIME24、AIME25和GPQA-diamond)上,使用不同規模的最新Qwen3模型(1.7B、4B和8B)進行了全面評估。結果表明,MUR平均減少了超過50%的計算量,同時將準確率提升了0.62-3.37%。
English
Large Language Models (LLMs) have achieved impressive performance on
reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an
open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it
often leads to overthinking, wasting tokens on redundant computations. This
work investigates how to efficiently and adaptively guide LLM test-time scaling
without additional training. Inspired by the concept of momentum in physics, we
propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically
allocates thinking budgets to critical reasoning steps by tracking and
aggregating stepwise uncertainty over time. To support flexible inference-time
control, we introduce gamma-control, a simple mechanism that tunes the
reasoning budget via a single hyperparameter. We provide in-depth theoretical
proof to support the superiority of MUR in terms of stability and biases. MUR
is comprehensively evaluated against various TTS methods across four
challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using
different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate
that MUR reduces computation by over 50% on average while improving accuracy by
0.62-3.37%.