DotaMath：利用程式輔助和自我校正進行數學推理的思維分解

摘要

大型語言模型（LLMs）在處理簡單數學問題方面取得了令人印象深刻的進展，但在應對更具挑戰性和複雜的數學任務方面仍然存在困難。本文介紹了一系列採用思維分解、程式碼輔助和自我校正進行數學推理的LLMs，被稱為DotaMath。DotaMath模型通過將複雜的數學任務分解為更簡單的邏輯子任務，利用程式碼來解決這些子任務，從程式碼解譯器獲取細粒度反饋，並進行自我反思和校正。通過對GSM8K和MATH數據集進行多樣互動工具使用軌跡的標註，並使用查詢演進生成DotaMathQA，其中包含574K個查詢-響應對。我們在DotaMathQA上使用模仿學習訓練了一系列基礎LLMs，從而產生了與各種領域內外基準相比表現出色的DotaMath模型。值得注意的是，DotaMath-deepseek-7B在具競爭力的MATH數據集上表現出色，達到64.8％，在GSM8K上達到86.7％。此外，DotaMath-deepseek-7B在一系列領域內外基準上保持著強大的競爭力（平均80.1％）。展望未來，我們預期DotaMath範式將開辟應對複雜數學問題的新途徑。我們的程式碼可在https://github.com/ChengpengLi1003/DotaMath 公開獲取。

English

Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at https://github.com/ChengpengLi1003/DotaMath.

DotaMath：利用程式輔助和自我校正進行數學推理的思維分解

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

摘要

Support