從錯誤中學習使大型語言模型（LLM）成為更好的推理者。

摘要

最近，大型語言模型（LLMs）展示了在解決數學問題時顯著的推理能力。為了進一步提高這種能力，本研究提出了類似於人類學習過程的「從錯誤中學習」（LeMa）方法。考慮一位數學學生無法解決一個數學問題的情況，他將從自己所犯的錯誤中學習，並找出如何糾正。LeMa 模仿這種錯誤驅動的學習過程，通過 GPT-4 生成的錯誤-糾正數據對來微調 LLMs。具體來說，我們首先從各種LLMs中收集不準確的推理路徑，然後利用 GPT-4 作為「校正者」來（1）識別錯誤步驟，（2）解釋錯誤原因，以及（3）糾正錯誤並生成最終答案。實驗結果顯示 LeMa 的有效性：在五個主幹LLMs和兩個數學推理任務中，LeMa 相對於僅在 CoT 數據上進行微調，持續提高了性能。令人印象深刻的是，LeMa 也可以使專門的LLMs（如 WizardMath 和 MetaMath）受益，實現了在 GSM8K 上 85.4% 的 pass@1 準確率，以及在 MATH 上 27.1% 的準確率。這超越了在這些具有挑戰性任務上由非執行開源模型實現的 SOTA 性能。我們的代碼、數據和模型將在 https://github.com/microsoft/CodeT 上公開提供。

English

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

從錯誤中學習使大型語言模型（LLM）成為更好的推理者。

Learning From Mistakes Makes LLM Better Reasoner

摘要

Support