MathCoder：在LLM中實現無縫代碼集成以增強數學推理

摘要

最近发布的GPT-4代碼解譯器展示了在解決具有挑戰性的數學問題方面的卓越能力，這主要歸因於其能夠無縫地運用自然語言進行推理、生成代碼、執行代碼，並根據執行結果繼續進行推理。在本文中，我們提出了一種微調開源語言模型的方法，使其能夠使用代碼來建模和推導數學方程式，從而增強其數學推理能力。我們提出了一種生成包含數學問題及基於代碼的解決方案的新穎高質量數據集的方法，稱為MathCodeInstruct。每個解決方案交錯著自然語言、代碼和執行結果。我們還介紹了一種定制的監督微調和推理方法。這種方法產生了MathCoder模型，這是一系列能夠為解決具有挑戰性的數學問題生成基於代碼的解決方案的模型。令人印象深刻的是，MathCoder模型在MATH（45.2%）和GSM8K（83.9%）數據集上實現了開源LLM中的最新成績，遠遠優於其他開源替代方案。值得注意的是，MathCoder模型不僅在GSM8K和MATH上超越了ChatGPT-3.5和PaLM-2，還在競賽級MATH數據集上超越了GPT-4。數據集和模型將在https://github.com/mathllm/MathCoder 上發布。

English

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

MathCoder：在LLM中實現無縫代碼集成以增強數學推理

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

摘要

Support