SAND-Math:利用大语言模型生成新颖、困难且实用的数学问题与解答
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers
July 28, 2025
作者: Chaitanya Manem, Pratik Prabhanjan Brahma, Prakamya Mishra, Zicheng Liu, Emad Barsoum
cs.AI
摘要
随着各行业对具备复杂数学推理能力的大型语言模型(LLMs)需求日益增长,开发高性能数学LLMs的关键瓶颈在于缺乏新颖且具挑战性的训练数据。为此,我们推出了SAND-Math(合成增强型新颖难题数学问题及解答)流程,该流程首先从零生成高质量问题,随后通过创新的“难度提升”步骤系统性地增加问题复杂度,有效解决了这一难题。我们通过两项核心发现验证了该方法的有效性:其一,将SAND-Math数据融入强基线模型后,性能显著提升,在AIME25基准测试上超越次优合成数据集达17.85个绝对分;其二,专项消融研究显示,“难度提升”过程极为高效,通过将平均问题难度从5.02提升至5.98,AIME25表现从46.38%跃升至49.23%。整套生成流程、最终数据集及微调模型共同构成了一个实用且可扩展的工具包,助力构建更强大、更高效的数学推理LLMs。SAND-Math数据集已发布于:https://huggingface.co/datasets/amd/SAND-MATH{https://huggingface.co/datasets/amd/SAND-MATH}。
English
The demand for Large Language Models (LLMs) capable of sophisticated
mathematical reasoning is growing across industries. However, the development
of performant mathematical LLMs is critically bottlenecked by the scarcity of
difficult, novel training data. We introduce SAND-Math (Synthetic
Augmented Novel and Difficult Mathematics problems and solutions), a pipeline
that addresses this by first generating high-quality problems from scratch and
then systematically elevating their complexity via a new Difficulty
Hiking step. We demonstrate the effectiveness of our approach through two key
findings. First, augmenting a strong baseline with SAND-Math data significantly
boosts performance, outperforming the next-best synthetic dataset by
uparrow 17.85 absolute points on the AIME25 benchmark. Second, in a
dedicated ablation study, we show our Difficulty Hiking process is highly
effective: by increasing average problem difficulty from 5.02 to 5.98, this
step lifts AIME25 performance from 46.38\% to 49.23\%. The full generation
pipeline, final dataset, and a fine-tuned model form a practical and scalable
toolkit for building more capable and efficient mathematical reasoning LLMs.
SAND-Math dataset is released here:
https://huggingface.co/datasets/amd/SAND-MATH{https://huggingface.co/datasets/amd/SAND-MATH}