SAND-Math：利用大语言模型生成新颖、困难且实用的数学问题与解答

摘要

随着各行业对具备复杂数学推理能力的大型语言模型（LLMs）需求日益增长，开发高性能数学LLMs的关键瓶颈在于缺乏新颖且具挑战性的训练数据。为此，我们推出了SAND-Math（合成增强型新颖难题数学问题及解答）流程，该流程首先从零生成高质量问题，随后通过创新的“难度提升”步骤系统性地增加问题复杂度，有效解决了这一难题。我们通过两项核心发现验证了该方法的有效性：其一，将SAND-Math数据融入强基线模型后，性能显著提升，在AIME25基准测试上超越次优合成数据集达17.85个绝对分；其二，专项消融研究显示，“难度提升”过程极为高效，通过将平均问题难度从5.02提升至5.98，AIME25表现从46.38%跃升至49.23%。整套生成流程、最终数据集及微调模型共同构成了一个实用且可扩展的工具包，助力构建更强大、更高效的数学推理LLMs。SAND-Math数据集已发布于：https://huggingface.co/datasets/amd/SAND-MATH{https://huggingface.co/datasets/amd/SAND-MATH}。

English

The demand for Large Language Models (LLMs) capable of sophisticated mathematical reasoning is growing across industries. However, the development of performant mathematical LLMs is critically bottlenecked by the scarcity of difficult, novel training data. We introduce SAND-Math (Synthetic Augmented Novel and Difficult Mathematics problems and solutions), a pipeline that addresses this by first generating high-quality problems from scratch and then systematically elevating their complexity via a new Difficulty Hiking step. We demonstrate the effectiveness of our approach through two key findings. First, augmenting a strong baseline with SAND-Math data significantly boosts performance, outperforming the next-best synthetic dataset by uparrow 17.85 absolute points on the AIME25 benchmark. Second, in a dedicated ablation study, we show our Difficulty Hiking process is highly effective: by increasing average problem difficulty from 5.02 to 5.98, this step lifts AIME25 performance from 46.38\% to 49.23\%. The full generation pipeline, final dataset, and a fine-tuned model form a practical and scalable toolkit for building more capable and efficient mathematical reasoning LLMs. SAND-Math dataset is released here: https://huggingface.co/datasets/amd/SAND-MATH{https://huggingface.co/datasets/amd/SAND-MATH}

SAND-Math：利用大语言模型生成新颖、困难且实用的数学问题与解答

SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

摘要

Support