InfinityMATH：程序化数学推理中的可扩展指令调整数据集

摘要

最近在思维链（CoT）和思维程序（PoT）方法方面的最新进展极大地增强了语言模型的数学推理能力，促进了它们与LLM一起集成到指导调优数据集中。然而，现有的大规模数据集创建方法需要大量种子数据和高计算成本进行数据合成，这对可扩展性构成了重大挑战。我们引入了InfinityMATH，一个可扩展的用于程序化数学推理的指导调优数据集。构建流程强调将数字与数学问题解耦，以合成独立于数字的程序，从而实现高效灵活的扩展，同时最大程度地减少对特定数值的依赖。使用开源语言和代码模型（如Llama2和CodeLlama）进行微调实验展示了InfinityMATH的实际优势。这些微调模型在领域内和领域外基准测试中都表现出显著的相对改进，平均范围从184.7%到514.3%不等。此外，这些模型在GSM8K+和MATH+基准测试中表现出很高的稳健性，这些测试集是简单数字变化的增强版本。InfinityMATH确保模型在更广泛的数学问题范围内更加多才多艺和有效。数据可在https://huggingface.co/datasets/flagopen/InfinityMATH获取。

English

Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH.

InfinityMATH：程序化数学推理中的可扩展指令调整数据集

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

摘要

Support