ChatPaper.aiChatPaper

DotaMath:利用代码辅助和自我校正对数学推理进行思维分解

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

July 4, 2024
作者: Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu
cs.AI

摘要

大型语言模型(LLMs)在处理简单数学问题方面取得了令人瞩目的进展,但仍然在更具挑战性和复杂的数学任务上遇到困难。在本文中,我们介绍了一系列采用“思维分解与代码辅助以及自我修正”进行数学推理的LLMs,被称为DotaMath。DotaMath模型通过将复杂数学任务分解为更简单的逻辑子任务,利用代码解决这些子任务,从代码解释器获取细粒度反馈,并进行自我反思和修正来解决这些任务。通过对GSM8K和MATH数据集上的多样互动工具使用轨迹进行注释,并采用查询演化,我们生成了一个包含574K个查询-响应对的指令微调数据集,称为DotaMathQA。我们在DotaMathQA上使用模仿学习训练了一系列基础LLMs,得到了与各种领域内外基准测试相比表现出色的DotaMath模型。值得注意的是,DotaMath-deepseek-7B在具有竞争性的MATH数据集上表现出色,达到64.8%,在GSM8K上达到86.7%。此外,DotaMath-deepseek-7B在一系列领域内外基准测试中保持了强大的竞争力(平均80.1%)。展望未来,我们期待DotaMath范式将为解决复杂的数学问题开辟新途径。我们的代码可在https://github.com/ChengpengLi1003/DotaMath 上公开获取。
English
Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at https://github.com/ChengpengLi1003/DotaMath.

Summary

AI-Generated Summary

PDF213November 28, 2024