InfinityMATH:一個在程式化數學推理中可擴展的指令調整資料集
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
August 9, 2024
作者: Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu
cs.AI
摘要
最近在思維鏈 (Chain-of-Thoughts, CoT) 和思維程序 (Program-of-Thoughts, PoT) 方法方面的進展大大增強了語言模型的數學推理能力,有助於將它們整合到具有LLMs的指導調整數據集中。然而,現有的大規模數據集創建方法需要大量種子數據和高計算成本進行數據合成,對可擴展性構成重大挑戰。我們引入了InfinityMATH,這是一個可擴展的用於程序化數學推理的指導調整數據集。構建流程強調將數字與數學問題解耦,以合成獨立於數字的程序,實現高效靈活的擴展,同時最大程度地減少對特定數值的依賴。使用開源語言和代碼模型(如Llama2和CodeLlama)進行微調實驗,展示了InfinityMATH的實際效益。這些微調模型在域內和域外基準測試中都顯示出顯著的相對改進,平均範圍從184.7%到514.3%。此外,這些模型在GSM8K+和MATH+基準測試中表現出很高的穩健性,這是具有僅數字變化的增強版本測試集。InfinityMATH確保模型在更廣泛範圍的數學問題上更加多才多藝和有效。數據可在https://huggingface.co/datasets/flagopen/InfinityMATH 上獲得。
English
Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT)
methods have greatly enhanced language models' mathematical reasoning
capabilities, facilitating their integration into instruction tuning datasets
with LLMs. However, existing methods for large-scale dataset creation require
substantial seed data and high computational costs for data synthesis, posing
significant challenges for scalability. We introduce InfinityMATH, a scalable
instruction tuning dataset for programmatic mathematical reasoning. The
construction pipeline emphasizes decoupling numbers from mathematical problems
to synthesize number-independent programs, enabling efficient and flexible
scaling while minimizing dependency on specific numerical values. Fine-tuning
experiments with open-source language and code models, such as Llama2 and
CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned
models, showed significant relative improvements on both in-domain and
out-of-domain benchmarks, ranging from 184.7% to 514.3% on average.
Additionally, these models exhibited high robustness on the GSM8K+ and MATH+
benchmarks, which are enhanced version of test sets with simply the number
variations. InfinityMATH ensures that models are more versatile and effective
across a broader range of mathematical problems. The data is available at
https://huggingface.co/datasets/flagopen/InfinityMATH.Summary
AI-Generated Summary