LEMMA: LLM의 수학적 능력 향상을 위한 오류 학습 기반 접근법

초록

대규모 언어 모델(LLMs)은 수학 문제 해결에서 놀라운 추론 능력을 보여주고 있습니다. 그러나 기존 접근 방식은 주로 올바른 학습 데이터의 품질을 향상시키는 데 초점을 맞추고 있으며, 예를 들어 고급 모델에서 고품질의 올바른 해결책을 추출하는 방식이 주를 이룹니다. 이 과정에서 오류 데이터가 지닌 가치를 간과함으로써 모델의 반성적 능력을 저해할 가능성이 있습니다. 일부 연구에서는 오류 데이터를 활용하려는 시도가 있지만, 이러한 연구들은 종종 Monte Carlo Tree Search (MCTS)와 같은 복잡한 메커니즘을 사용하여 오류 노드를 탐색합니다. 본 연구에서는 수학적 발전을 위한 오류 학습(LEMMA)을 통해 LLMs의 추론 능력을 향상시키는 방법을 제안합니다. LEMMA는 잘못된 단계와 올바른 해결책으로의 반성적 연결을 포함하는 오류 해결책으로 구성된 데이터를 미세 조정에 사용합니다. 구체적으로, 우리는 모델이 생성한 오류 유형을 체계적으로 분석하고, 다양한 대표성을 지닌 오류를 수집하기 위해 오류 유형 기반의 실수 증강 방법을 도입합니다. 올바른 해결책은 오류를 수정하거나 처음부터 다시 생성한 것입니다. 모델 인식형 부드러운 반성적 연결을 통해 오류 해결책은 올바른 해결책으로 전환됩니다. 구성된 데이터셋을 통해 미세 조정함으로써, 모델은 외부 비평 모델에 의존하지 않고도 생성 과정 내에서 자율적으로 오류를 수정할 수 있습니다. 실험 결과는 LEMMA가 다른 강력한 베이스라인 대비 상당한 성능 향상을 달성함을 보여줍니다.

English

Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

LEMMA: LLM의 수학적 능력 향상을 위한 오류 학습 기반 접근법

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

초록

Support