ChatGLM-Math: 자기비판 파이프라인을 통해 대규모 언어 모델의 수학 문제 해결 능력 향상

초록

대규모 언어 모델(LLM)은 인간 언어를 탁월하게 습득했지만, 수학적 문제 해결이 필요한 실제 응용 분야에서는 여전히 어려움을 겪고 있습니다. LLM의 수학 능력을 향상시키기 위한 다양한 전략과 데이터셋이 개발되었지만, 배포된 LLM 시스템에서 언어 능력과 수학 능력을 동시에 유지하고 개선하는 것은 여전히 과제로 남아 있습니다. 본 연구에서는 LLM 정렬 과정의 피드백 학습 단계에서 이러한 과제를 해결하기 위해 Self-Critique 파이프라인을 맞춤화했습니다. 먼저, LLM 자체에서 일반적인 Math-Critique 모델을 학습시켜 피드백 신호를 제공합니다. 그런 다음, LLM의 자체 생성물에 대해 거부적 미세 조정(rejective fine-tuning)과 직접 선호 최적화(direct preference optimization)를 순차적으로 적용하여 데이터를 수집합니다. ChatGLM3-32B를 기반으로 학술 데이터셋과 새롭게 생성한 도전적인 데이터셋인 MathUserEval에서 일련의 실험을 수행했습니다. 결과는 우리의 파이프라인이 LLM의 수학적 문제 해결 능력을 크게 향상시키면서도 언어 능력을 개선하며, 크기가 두 배나 큰 LLM을 능가함을 보여줍니다. 관련 기술은 온라인 서비스 LLM인 ChatGLM\url{https://chatglm.cn}에 배포되었습니다. 관련 평가 데이터셋과 스크립트는 https://github.com/THUDM/ChatGLM-Math에서 공개되었습니다.

English

Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\url{https://chatglm.cn}, an online serving LLM. Related evaluation dataset and scripts are released at https://github.com/THUDM/ChatGLM-Math.

ChatGLM-Math: 자기비판 파이프라인을 통해 대규모 언어 모델의 수학 문제 해결 능력 향상

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

초록

Support