MathScale：針對數學推理的指令調整進行擴展

摘要

大型語言模型（LLMs）展示了在解決問題方面的卓越能力。然而，它們在解決數學問題方面的熟練度仍然不足。我們提出了MathScale，這是一種簡單且可擴展的方法，使用前沿的LLMs（例如GPT-3.5）來創建高質量的數學推理數據。受人類數學學習中的認知機制啟發，該方法首先從種子數學問題中提取主題和知識點，然後構建概念圖，隨後用於生成新的數學問題。MathScale在我們生成的數學數據集的大小軸上展現出有效的可擴展性。因此，我們創建了一個包含兩百萬個數學問題-答案對的數學推理數據集（MathScaleQA）。為了全面評估LLMs的數學推理能力，我們構建了MwpBench，這是一個數學文字問題基準測試，其中包括十個數據集（包括GSM8K和MATH），涵蓋K-12、大學和競賽級別的數學問題。我們將MathScaleQA應用於微調開源LLMs（例如LLaMA-2和Mistral），從而顯著提高了數學推理能力。在MwpBench上評估，MathScale-7B在所有數據集上均實現了最先進的性能，分別在微平均準確度和宏平均準確度上超過同等大小的最佳對手42.9%和43.7%。

English

Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., {\tt GPT-3.5}). Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions. MathScale exhibits effective scalability along the size axis of the math dataset that we generate. As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. To evaluate mathematical reasoning abilities of LLMs comprehensively, we construct {\sc MwpBench}, a benchmark of Math Word Problems, which is a collection of ten datasets (including GSM8K and MATH) covering K-12, college, and competition level math problems. We apply MathScaleQA to fine-tune open-source LLMs (e.g., LLaMA-2 and Mistral), resulting in significantly improved capabilities in mathematical reasoning. Evaluated on {\sc MwpBench}, MathScale-7B achieves state-of-the-art performance across all datasets, surpassing its best peers of equivalent size by 42.9\% in micro average accuracy and 43.7\% in macro average accuracy, respectively.

MathScale：針對數學推理的指令調整進行擴展

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

摘要

Support