DeepMath-103K：推論能力の向上に向けた大規模で挑戦的、精選され、検証可能な数学データセット

要旨

複雑な数学的推論能力は、人工知能の重要なベンチマークである。大規模言語モデル（LLM）に強化学習（RL）を適用するアプローチは有望であるが、十分に挑戦的でRLに適した検証可能な解答形式を持ち、評価ベンチマークとの混入がない大規模な訓練データの不足が進展を大きく妨げている。これらの制約に対処するため、我々はDeepMath-103Kを導入する。これは約103,000の数学問題からなる新たな大規模データセットであり、RLを用いた高度な推論モデルの訓練に特化して設計されている。DeepMath-103Kは、ソース分析、多数のベンチマークに対する厳格な除染、および高難易度（主にレベル5-9）のフィルタリングを含む厳密なパイプラインを通じてキュレーションされており、既存のオープンリソースを大幅に上回る挑戦性を備えている。各問題には、ルールベースのRLを可能にする検証可能な最終解答と、教師あり微調整や蒸留などの多様な訓練パラダイムに適した3つの異なるR1生成ソリューションが含まれている。幅広い数学的トピックをカバーするDeepMath-103Kは、一般化可能な推論能力の開発を促進する。我々は、DeepMath-103Kで訓練されたモデルが挑戦的な数学的ベンチマークで大幅な改善を達成することを実証し、その有効性を検証した。より強力なAI推論システムの構築に向けたコミュニティの進展を促進するため、DeepMath-103Kを公開する：https://github.com/zwhe99/DeepMath。

English

The capacity for complex mathematical reasoning is a key benchmark for artificial intelligence. While reinforcement learning (RL) applied to LLMs shows promise, progress is significantly hindered by the lack of large-scale training data that is sufficiently challenging, possesses verifiable answer formats suitable for RL, and is free from contamination with evaluation benchmarks. To address these limitations, we introduce DeepMath-103K, a new, large-scale dataset comprising approximately 103K mathematical problems, specifically designed to train advanced reasoning models via RL. DeepMath-103K is curated through a rigorous pipeline involving source analysis, stringent decontamination against numerous benchmarks, and filtering for high difficulty (primarily Levels 5-9), significantly exceeding existing open resources in challenge. Each problem includes a verifiable final answer, enabling rule-based RL, and three distinct R1-generated solutions suitable for diverse training paradigms like supervised fine-tuning or distillation. Spanning a wide range of mathematical topics, DeepMath-103K promotes the development of generalizable reasoning. We demonstrate that models trained on DeepMath-103K achieve significant improvements on challenging mathematical benchmarks, validating its effectiveness. We release DeepMath-103K publicly to facilitate community progress in building more capable AI reasoning systems: https://github.com/zwhe99/DeepMath.

DeepMath-103K：推論能力の向上に向けた大規模で挑戦的、精選され、検証可能な数学データセット

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

要旨

Support