ChatPaper.aiChatPaper

ChatGLM-Math:通过自批判流程提升大型语言模型中的数学问题解决能力

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

April 3, 2024
作者: Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong
cs.AI

摘要

大型语言模型(LLMs)在掌握人类语言方面表现出色,但在需要数学问题解决的实际应用中仍显不足。尽管已有多种策略和数据集用于提升LLMs的数学能力,但在部署的LLM系统中同时保持和提升语言与数学能力仍是一大挑战。在本研究中,我们针对LLM对齐过程中的反馈学习阶段,定制了自我批判流程。首先,我们从LLM本身训练出一个通用的数学批判模型,以提供反馈信号。接着,我们依次采用拒绝式微调和直接偏好优化,对LLM自身的生成内容进行数据收集。基于ChatGLM3-32B,我们在学术数据集和我们新创的挑战性数据集MathUserEval上进行了一系列实验。结果表明,我们的流程显著提升了LLM的数学问题解决能力,同时仍能提高其语言能力,表现优于规模可能大两倍的LLMs。相关技术已部署至在线服务LLM——ChatGLM(https://chatglm.cn)。相关评估数据集和脚本已发布于https://github.com/THUDM/ChatGLM-Math。
English
Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\url{https://chatglm.cn}, an online serving LLM. Related evaluation dataset and scripts are released at https://github.com/THUDM/ChatGLM-Math.

Summary

AI-Generated Summary

PDF232November 26, 2024