間違いから学ぶことでLLMはより優れた推論者になる

要旨

大規模言語モデル（LLMs）は最近、数学問題を解く際に顕著な推論能力を示しています。この能力をさらに向上させるため、本研究では人間の学習プロセスに似た「Learning from Mistakes（LeMa）」を提案します。数学問題を解くことに失敗した学生が、どのような間違いを犯したのか、そしてそれをどのように修正するのかを学ぶように、LeMaはGPT-4によって生成された誤り修正データペアを用いてLLMsをファインチューニングします。具体的には、まず様々なLLMsから不正確な推論パスを収集し、次にGPT-4を「修正者」として活用して、(1) 間違いのステップを特定し、(2) 間違いの理由を説明し、(3) 間違いを修正して最終的な答えを生成します。実験結果はLeMaの有効性を示しています：5つの基盤LLMsと2つの数学的推論タスクにおいて、LeMaはCoTデータのみでファインチューニングした場合と比較して一貫して性能を向上させます。特に、LeMaはWizardMathやMetaMathといった専門化されたLLMsにも有効であり、GSM8Kでは85.4%のpass@1精度、MATHでは27.1%の精度を達成しました。これは、これらの難しいタスクにおいて非実行型のオープンソースモデルが達成したSOTA性能を上回るものです。私たちのコード、データ、モデルはhttps://github.com/microsoft/CodeTで公開されます。

English

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

間違いから学ぶことでLLMはより優れた推論者になる

Learning From Mistakes Makes LLM Better Reasoner

要旨

Support