AlphaOne: テスト時に遅い思考と速い思考を推論するモデル

要旨

本論文では、大規模推論モデル（LRM）における推論進行をテスト時に調整するための汎用フレームワークであるAlphaOne（alpha1）を提案する。alpha1はまず、スケーリングされた思考段階を普遍的なパラメータalphaで表すalpha momentを導入する。このスケーリングされたpre-alpha moment段階内で、推論遷移トークンの挿入をベルヌーイ確率過程としてモデル化し、遅い思考から速い思考への遷移を動的にスケジュールする。alpha momentの後、alpha1はend-of-thinkingトークンを用いて遅い思考を決定論的に終了させ、迅速な推論と効率的な回答生成を促進する。このアプローチは、柔軟かつ密な遅い思考から速い思考への調整を可能にすることで、既存の単調スケーリング手法を統一し、一般化する。数学、コーディング、科学分野にわたる様々な挑戦的なベンチマークでの広範な実証研究により、alpha1の優れた推論能力と効率性が示されている。プロジェクトページ: https://alphaone-project.github.io/

English

This paper presents AlphaOne (alpha1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. alpha1 first introduces alpha moment, which represents the scaled thinking phase with a universal parameter alpha. Within this scaled pre-alpha moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the alpha moment, alpha1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate alpha1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/

AlphaOne: テスト時に遅い思考と速い思考を推論するモデル

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

要旨

Support