실수를 통해 학습하는 것이 LLM의 추론 능력을 향상시킨다

초록

최근 대형 언어 모델(LLMs)은 수학 문제 해결에서 놀라운 추론 능력을 보여주었다. 이러한 능력을 더욱 향상시키기 위해, 본 연구는 인간의 학습 과정과 유사한 "실수로부터 학습하기(Learning from Mistakes, LeMa)"를 제안한다. 수학 문제를 풀지 못한 학생이 자신이 어떤 실수를 했는지 그리고 어떻게 수정해야 하는지 배우는 과정을 모방하여, LeMa는 GPT-4가 생성한 실수-수정 데이터 쌍을 통해 LLMs를 미세 조정한다. 구체적으로, 우리는 먼저 다양한 LLMs에서 부정확한 추론 경로를 수집한 후, GPT-4를 "수정자"로 활용하여 (1) 실수 단계를 식별하고, (2) 실수의 원인을 설명하며, (3) 실수를 수정하고 최종 답을 생성한다. 실험 결과는 LeMa의 효과를 입증한다: 다섯 가지 백본 LLMs와 두 가지 수학적 추론 과제에서, LeMa는 CoT 데이터만을 사용한 미세 조정에 비해 일관되게 성능을 향상시켰다. 특히, LeMa는 WizardMath 및 MetaMath와 같은 특화된 LLMs에도 도움을 주어, GSM8K에서 85.4%의 pass@1 정확도와 MATH에서 27.1%의 정확도를 달성했다. 이는 이러한 도전적인 과제에서 비실행 오픈소스 모델들이 달성한 최신 기술(SOTA) 성능을 능가하는 것이다. 우리의 코드, 데이터 및 모델은 https://github.com/microsoft/CodeT에서 공개될 예정이다.

English

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

실수를 통해 학습하는 것이 LLM의 추론 능력을 향상시킨다

Learning From Mistakes Makes LLM Better Reasoner

초록

Support