MUR：面向大语言模型的动量不确定性引导推理

摘要

大型语言模型（LLMs）在推理密集型任务上已展现出卓越性能，然而优化其推理效率仍是一个待解的难题。尽管测试时扩展（TTS）提升了推理质量，却常导致过度思考，浪费计算资源于冗余运算。本研究探讨了如何在不额外训练的前提下，高效且自适应地引导LLM的测试时扩展。受物理学中动量概念的启发，我们提出了动量不确定性引导推理（MUR），通过追踪并聚合随时间推移的步骤不确定性，动态分配思考预算至关键推理步骤。为支持灵活的推理时控制，我们引入了gamma控制机制，这一简单方法通过单一超参数调节推理预算。我们提供了深入的理论证明，支持MUR在稳定性和偏差方面的优越性。MUR在四个具有挑战性的基准测试（MATH-500、AIME24、AIME25和GPQA-diamond）上，针对不同规模的近期Qwen3模型（1.7B、4B和8B），与多种TTS方法进行了全面评估。结果表明，MUR平均减少超过50%的计算量，同时准确率提升0.62%至3.37%。

English

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.

MUR：面向大语言模型的动量不确定性引导推理

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

摘要

Support