從錯誤中學習使大型語言模型(LLM)成為更好的推理者。
Learning From Mistakes Makes LLM Better Reasoner
October 31, 2023
作者: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen
cs.AI
摘要
最近,大型語言模型(LLMs)展示了在解決數學問題時顯著的推理能力。為了進一步提高這種能力,本研究提出了類似於人類學習過程的「從錯誤中學習」(LeMa)方法。考慮一位數學學生無法解決一個數學問題的情況,他將從自己所犯的錯誤中學習,並找出如何糾正。LeMa 模仿這種錯誤驅動的學習過程,通過 GPT-4 生成的錯誤-糾正數據對來微調 LLMs。具體來說,我們首先從各種LLMs中收集不準確的推理路徑,然後利用 GPT-4 作為「校正者」來(1)識別錯誤步驟,(2)解釋錯誤原因,以及(3)糾正錯誤並生成最終答案。實驗結果顯示 LeMa 的有效性:在五個主幹LLMs和兩個數學推理任務中,LeMa 相對於僅在 CoT 數據上進行微調,持續提高了性能。令人印象深刻的是,LeMa 也可以使專門的LLMs(如 WizardMath 和 MetaMath)受益,實現了在 GSM8K 上 85.4% 的 pass@1 準確率,以及在 MATH 上 27.1% 的準確率。這超越了在這些具有挑戰性任務上由非執行開源模型實現的 SOTA 性能。我們的代碼、數據和模型將在 https://github.com/microsoft/CodeT 上公開提供。
English
Large language models (LLMs) recently exhibited remarkable reasoning
capabilities on solving math problems. To further improve this capability, this
work proposes Learning from Mistakes (LeMa), akin to human learning processes.
Consider a human student who failed to solve a math problem, he will learn from
what mistake he has made and how to correct it. Mimicking this error-driven
learning process, LeMa fine-tunes LLMs on mistake-correction data pairs
generated by GPT-4. Specifically, we first collect inaccurate reasoning paths
from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the
mistake step, (2) explain the reason for the mistake, and (3) correct the
mistake and generate the final answer. Experimental results demonstrate the
effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning
tasks, LeMa consistently improves the performance compared with fine-tuning on
CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as
WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on
MATH. This surpasses the SOTA performance achieved by non-execution open-source
models on these challenging tasks. Our code, data and models will be publicly
available at https://github.com/microsoft/CodeT.