AlphaOne：测试时慢速与快速推理的思维模型

摘要

本文提出AlphaOne（alpha1），一种在测试时调节大型推理模型（LRMs）推理进程的通用框架。alpha1首先引入了alpha时刻，该时刻通过一个通用参数alpha来表征缩放后的思考阶段。在此预alpha时刻的缩放阶段内，它通过将推理过渡令牌的插入建模为伯努利随机过程，动态地调度慢速思维的转换。alpha时刻之后，alpha1确定性地以思考结束令牌终止慢速思维，从而促进快速推理和高效答案生成。此方法通过实现灵活且密集的慢速到快速推理调节，统一并推广了现有的单调缩放方法。在数学、编程及科学领域的一系列挑战性基准上的广泛实证研究，展示了alpha1卓越的推理能力与效率。项目页面：https://alphaone-project.github.io/

English

This paper presents AlphaOne (alpha1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. alpha1 first introduces alpha moment, which represents the scaled thinking phase with a universal parameter alpha. Within this scaled pre-alpha moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the alpha moment, alpha1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate alpha1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/