ChatPaper.aiChatPaper

引理:从错误中学习以推动大语言模型的数学进步

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

March 21, 2025
作者: Zhuoshi Pan, Yu Li, Honglin Lin, Qizhi Pei, Zinan Tang, Wei Wu, Chenlin Ming, H. Vicky Zhao, Conghui He, Lijun Wu
cs.AI

摘要

大型语言模型(LLMs)在解决数学问题方面展现出了卓越的推理能力。然而,现有方法主要集中于提升正确训练数据的质量,例如从高级模型中提炼高质量的正确解答,却忽视了错误数据中蕴含的价值,这可能会限制模型的反思能力。尽管一些研究尝试利用错误数据,但它们通常涉及复杂的机制,如通过蒙特卡洛树搜索(MCTS)探索错误节点。在本研究中,我们提出通过“从错误中学习以促进数学进步”(LEMMA)来增强LLMs的推理能力。LEMMA构建了包含错误解答、错误步骤及与正确解答反思关联的数据集用于微调。具体而言,我们系统分析了模型生成的错误类型,并引入了一种基于错误类型的错误增强方法,以收集多样且具代表性的错误。正确解答则通过修正错误或重新生成获得。通过模型感知的平滑反思连接,错误解答被转化为正确解答。通过在构建的数据集上进行微调,模型能够在生成过程中自主纠正错误,而无需依赖外部评判模型。实验结果表明,LEMMA相较于其他强基线模型实现了显著的性能提升。
English
Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

Summary

AI-Generated Summary

PDF152March 25, 2025