AIMO-2 우승 솔루션: OpenMathReasoning 데이터셋을 활용한 최첨단 수학적 추론 모델 구축

초록

본 논문은 AI 수학 올림피아드 - Progress Prize 2(AIMO-2) 대회에서 우승한 우리의 제출물을 소개합니다. 최첨단 수학적 추론 모델을 구축하기 위한 우리의 접근법은 세 가지 핵심 요소에 기반합니다. 첫째, 올림피아드 수준의 문제를 포함한 54만 개의 고품질 수학 문제와 이에 대한 320만 개의 장문 추론 해결책으로 구성된 대규모 데이터셋을 구축했습니다. 둘째, 반복적인 학습, 생성 및 품질 필터링을 통해 코드 실행을 장문 추론 모델과 통합하는 새로운 방법을 개발하여 170만 개의 고품질 도구 통합 추론(Tool-Integrated Reasoning) 해결책을 생성했습니다. 셋째, 여러 후보 해결책 중에서 가장 유망한 해결책을 선택하도록 모델을 훈련시키는 파이프라인을 구축했습니다. 이러한 생성적 해결책 선택(GenSelect)이 다수결 기반선을 크게 개선할 수 있음을 보여줍니다. 이러한 아이디어를 결합하여, 우리는 수학적 추론 벤치마크에서 최첨단 결과를 달성하는 일련의 모델을 훈련시켰습니다. 추가 연구를 촉진하기 위해, 우리는 상업적으로 허용 가능한 라이선스 하에 코드, 모델 및 완전한 OpenMathReasoning 데이터셋을 공개합니다.

English

This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license.