DeepMath-103K: 추론 능력 향상을 위한 대규모, 도전적, 오염 제거 및 검증 가능한 수학 데이터셋

초록

복잡한 수학적 추론 능력은 인공지능의 핵심 평가 기준 중 하나입니다. 대규모 언어 모델(LLM)에 강화 학습(RL)을 적용하는 것은 유망하지만, 충분히 도전적이며 RL에 적합한 검증 가능한 답변 형식을 갖추고 평가 벤치마크와의 오염이 없는 대규모 학습 데이터의 부재로 인해 진전이 크게 저해되고 있습니다. 이러한 한계를 해결하기 위해, 우리는 약 103,000개의 수학 문제로 구성된 새로운 대규모 데이터셋인 DeepMath-103K를 소개합니다. 이 데이터셋은 RL을 통해 고급 추론 모델을 훈련하기 위해 특별히 설계되었습니다. DeepMath-103K는 소스 분석, 다양한 벤치마크에 대한 엄격한 오염 제거, 그리고 높은 난이도(주로 레벨 5-9)를 위한 필터링을 포함한 엄격한 파이프라인을 통해 선별되었으며, 기존의 공개 리소스를 크게 뛰어넘는 도전성을 제공합니다. 각 문제는 규칙 기반 RL을 가능하게 하는 검증 가능한 최종 답변과 지도 미세 조정 또는 증류와 같은 다양한 훈련 패러다임에 적합한 세 가지 독립적인 R1 생성 솔루션을 포함합니다. 광범위한 수학 주제를 아우르는 DeepMath-103K는 일반화 가능한 추론 능력의 개발을 촉진합니다. 우리는 DeepMath-103K로 훈련된 모델이 도전적인 수학 벤치마크에서 상당한 개선을 달성함으로써 그 효과성을 입증했습니다. 더 나은 AI 추론 시스템 구축을 위한 커뮤니티의 진전을 돕기 위해 DeepMath-103K를 공개합니다: https://github.com/zwhe99/DeepMath.

English

The capacity for complex mathematical reasoning is a key benchmark for artificial intelligence. While reinforcement learning (RL) applied to LLMs shows promise, progress is significantly hindered by the lack of large-scale training data that is sufficiently challenging, possesses verifiable answer formats suitable for RL, and is free from contamination with evaluation benchmarks. To address these limitations, we introduce DeepMath-103K, a new, large-scale dataset comprising approximately 103K mathematical problems, specifically designed to train advanced reasoning models via RL. DeepMath-103K is curated through a rigorous pipeline involving source analysis, stringent decontamination against numerous benchmarks, and filtering for high difficulty (primarily Levels 5-9), significantly exceeding existing open resources in challenge. Each problem includes a verifiable final answer, enabling rule-based RL, and three distinct R1-generated solutions suitable for diverse training paradigms like supervised fine-tuning or distillation. Spanning a wide range of mathematical topics, DeepMath-103K promotes the development of generalizable reasoning. We demonstrate that models trained on DeepMath-103K achieve significant improvements on challenging mathematical benchmarks, validating its effectiveness. We release DeepMath-103K publicly to facilitate community progress in building more capable AI reasoning systems: https://github.com/zwhe99/DeepMath.

DeepMath-103K: 추론 능력 향상을 위한 대규모, 도전적, 오염 제거 및 검증 가능한 수학 데이터셋

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

초록

Support