AlphaOne: 테스트 시간에 느리고 빠르게 사고하는 추론 모델

초록

본 논문은 테스트 시점에서 대규모 추론 모델(Large Reasoning Models, LRMs)의 추론 진행을 조절하기 위한 범용 프레임워크인 AlphaOne(alpha1)을 소개한다. alpha1은 먼저 보편적 매개변수 alpha로 표현되는 스케일링된 사고 단계인 '알파 모멘트(alpha moment)'를 도입한다. 이 스케일링된 '프리-알파 모멘트(pre-alpha moment)' 단계 내에서, alpha1은 추론 전환 토큰의 삽입을 베르누이 확률 과정으로 모델링함으로써 느린 사고 전환을 동적으로 스케줄링한다. 알파 모멘트 이후, alpha1은 '사고 종료 토큰(end-of-thinking token)'을 통해 느린 사고를 결정론적으로 종료함으로써 빠른 추론과 효율적인 답변 생성을 촉진한다. 이 접근법은 기존의 단조 스케일링 방법을 통합하고 일반화하며, 유연하고 밀도 높은 느린-빠른 추론 조절을 가능하게 한다. 수학, 코딩, 과학 분야의 다양한 도전적인 벤치마크에 대한 광범위한 실험 연구를 통해 alpha1의 우수한 추론 능력과 효율성을 입증하였다. 프로젝트 페이지: https://alphaone-project.github.io/

English

This paper presents AlphaOne (alpha1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. alpha1 first introduces alpha moment, which represents the scaled thinking phase with a universal parameter alpha. Within this scaled pre-alpha moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the alpha moment, alpha1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate alpha1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/

AlphaOne: 테스트 시간에 느리고 빠르게 사고하는 추론 모델

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

초록

Support