ChatPaper.aiChatPaper

MUR:面向大语言模型的动量不确定性引导推理

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

July 20, 2025
作者: Hang Yan, Fangzhi Xu, Rongman Xu, Yifei Li, Jian Zhang, Haoran Luo, Xiaobao Wu, Luu Anh Tuan, Haiteng Zhao, Qika Lin, Jun Liu
cs.AI

摘要

大型语言模型(LLMs)在推理密集型任务上已展现出卓越性能,然而优化其推理效率仍是一个待解的难题。尽管测试时扩展(TTS)提升了推理质量,却常导致过度思考,浪费计算资源于冗余运算。本研究探讨了如何在不额外训练的前提下,高效且自适应地引导LLM的测试时扩展。受物理学中动量概念的启发,我们提出了动量不确定性引导推理(MUR),通过追踪并聚合随时间推移的步骤不确定性,动态分配思考预算至关键推理步骤。为支持灵活的推理时控制,我们引入了gamma控制机制,这一简单方法通过单一超参数调节推理预算。我们提供了深入的理论证明,支持MUR在稳定性和偏差方面的优越性。MUR在四个具有挑战性的基准测试(MATH-500、AIME24、AIME25和GPQA-diamond)上,针对不同规模的近期Qwen3模型(1.7B、4B和8B),与多种TTS方法进行了全面评估。结果表明,MUR平均减少超过50%的计算量,同时准确率提升0.62%至3.37%。
English
Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.
PDF453July 25, 2025