MetaMath: 대형 언어 모델을 위한 수학 질문 자가 생성 기법

초록

대규모 언어 모델(LLM)은 자연어 이해의 한계를 넓히고 뛰어난 문제 해결 능력을 보여주었습니다. 그러나 이러한 큰 성공에도 불구하고, 대부분의 기존 오픈소스 LLM(예: LLaMA-2)은 복잡한 추론 과정으로 인해 수학 문제 해결에 있어 만족스러운 수준에 이르지 못하고 있습니다. 이러한 격차를 해소하기 위해, 우리는 수학적 추론에 특화된 미세 조정(fine-tuned) 언어 모델인 MetaMath를 제안합니다. 구체적으로, 우리는 추가 지식 없이 질문을 다양한 관점에서 재구성하여 수학적 질문을 부트스트래핑(bootstrapping)하는 방식으로 시작하며, 이를 통해 MetaMathQA라는 새로운 데이터셋을 생성합니다. 이후 LLaMA-2 모델을 MetaMathQA 데이터셋으로 미세 조정합니다. 수학적 추론을 위한 두 가지 인기 벤치마크(GSM8K 및 MATH)에서의 실험 결과는 MetaMath가 다양한 오픈소스 LLM을 상당한 차이로 능가함을 보여줍니다. 우리의 MetaMath-7B 모델은 GSM8K에서 66.4%, MATH에서 19.4%의 정확도를 달성하며, 동일한 크기의 최신 모델을 각각 11.5%와 8.7% 앞섭니다. 특히, MetaMath-70B는 GSM8K에서 82.3%의 정확도를 달성하여 GPT-3.5-Turbo를 약간 상회합니다. 우리는 MetaMathQA 데이터셋, 다양한 크기의 MetaMath 모델, 그리고 훈련 코드를 공개하여 누구나 사용할 수 있도록 합니다.

English

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (\eg, LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called {MetaMathQA}. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (\ie, GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, {MetaMath-70B} achieves an accuracy of 82.3% on {GSM8K}, slightly better than {GPT-3.5-Turbo}. We release the {MetaMathQA} dataset, the {MetaMath} models with different model sizes and the training code for public use.

MetaMath: 대형 언어 모델을 위한 수학 질문 자가 생성 기법

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

초록

Support