从错误中学习使LLM成为更好的推理者。

摘要

最近，大型语言模型（LLMs）展现出在解决数学问题时出色的推理能力。为了进一步提高这种能力，本研究提出了“从错误中学习”（LeMa）的方法，类似于人类学习过程。考虑一个数学问题解答失败的人类学生，他将从自己犯的错误中学习，并纠正它。模仿这种错误驱动的学习过程，LeMa利用由GPT-4生成的错误-纠正数据对LLMs进行微调。具体而言，我们首先收集来自各种LLMs的错误推理路径，然后利用GPT-4作为“纠正者”来（1）识别错误步骤，（2）解释错误原因，以及（3）纠正错误并生成最终答案。实验结果表明LeMa的有效性：在五个主干LLMs和两个数学推理任务中，LeMa相对于仅在CoT数据上进行微调，始终提高了性能。令人印象深刻的是，LeMa还可以使专门的LLMs（如WizardMath和MetaMath）受益，实现了在GSM8K上85.4%的一次通过准确率和在MATH上27.1%的准确率。这超过了在这些具有挑战性任务上由非执行开源模型实现的SOTA性能。我们的代码、数据和模型将在https://github.com/microsoft/CodeT 上公开提供。

English

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

从错误中学习使LLM成为更好的推理者。

Learning From Mistakes Makes LLM Better Reasoner

摘要

Support