ChatPaper.aiChatPaper

Nemotron-Math:基于多模态监督的高效长上下文数学推理知识蒸馏

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

December 17, 2025
作者: Wei Du, Shubham Toshniwal, Branislav Kisacanin, Sadegh Mahdavi, Ivan Moshkov, George Armstrong, Stephen Ge, Edgar Minasyan, Feng Chen, Igor Gitman
cs.AI

摘要

高质量数学推理监督需要多样化的推理风格、长篇幅的推导轨迹以及有效的工具集成能力,而现有数据集仅能有限地提供这些要素。依托gpt-oss-120b的多模式生成能力,我们推出Nemotron-Math——一个包含750万条解题轨迹的大规模数学推理数据集,涵盖高、中、低三种推理模式,每种模式均提供含Python工具集成推理(TIR)与不含TIR的版本。该数据集整合了8.5万道精编AoPS试题与26.2万道社区来源的StackExchange-Math问题,将结构化竞赛任务与多样化的真实数学问题相结合。我们通过受控评估来检验数据集质量:Nemotron-Math在匹配的AoPS问题上持续超越原始OpenMathReasoning;引入StackExchange-Math数据显著提升了模型鲁棒性与泛化能力(尤其在HLE-Math测试中),同时保持数学竞赛基准的准确率。为支持高效长上下文训练,我们开发了分段分桶策略,使128K上下文长度的微调加速2-3倍且无显著精度损失。总体而言,Nemotron-Math实现了最先进的性能表现,包括在AIME 2024和2025测试中采用Python TIR时达到100% maj@16准确率。
English
High-quality mathematical reasoning supervision requires diverse reasoning styles, long-form traces, and effective tool integration, capabilities that existing datasets provide only in limited form. Leveraging the multi-mode generation ability of gpt-oss-120b, we introduce Nemotron-Math, a large-scale mathematical reasoning dataset containing 7.5M solution traces across high, medium, and low reasoning modes, each available both with and without Python tool-integrated reasoning (TIR). The dataset integrates 85K curated AoPS problems with 262K community-sourced StackExchange-Math problems, combining structured competition tasks with diverse real-world mathematical queries. We conduct controlled evaluations to assess the dataset quality. Nemotron-Math consistently outperforms the original OpenMathReasoning on matched AoPS problems. Incorporating StackExchange-Math substantially improves robustness and generalization, especially on HLE-Math, while preserving accuracy on math competition benchmarks. To support efficient long-context training, we develop a sequential bucketed strategy that accelerates 128K context-length fine-tuning by 2--3times without significant accuracy loss. Overall, Nemotron-Math enables state-of-the-art performance, including 100\% maj@16 accuracy on AIME 2024 and 2025 with Python TIR.
PDF01December 20, 2025