InternLM-Math:面向可验证推理的开放数学大语言模型
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
February 9, 2024
作者: Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin
cs.AI
摘要
大型语言模型的数学能力可以代表它们的抽象推理能力。本文介绍并开源了我们的数学推理LLMs InternLM-Math,它是从InternLM2继续预训练的。我们将思维链推理、奖励建模、形式推理、数据增强和代码解释器统一到一个统一的seq2seq格式中,并监督我们的模型成为多功能数学推理者、验证者、证明者和增强者。这些能力可用于开发下一代数学LLMs或自我迭代。InternLM-Math在上下文学习、监督微调和代码辅助推理等各种非正式和正式基准测试中获得了开源的最先进性能,包括GSM8K、MATH、匈牙利数学考试、MathBench-ZH和MiniF2F。我们的预训练模型在MiniF2F测试集上实现了30.3的成绩,无需微调。我们进一步探讨了如何使用LEAN解决数学问题,并研究了在多任务学习设置下的性能,显示了使用LEAN作为数学解决和证明的统一平台的可能性。我们的模型、代码和数据已发布在https://github.com/InternLM/InternLM-Math。
English
The math abilities of large language models can represent their abstract
reasoning ability. In this paper, we introduce and open-source our math
reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We
unify chain-of-thought reasoning, reward modeling, formal reasoning, data
augmentation, and code interpreter in a unified seq2seq format and supervise
our model to be a versatile math reasoner, verifier, prover, and augmenter.
These abilities can be used to develop the next math LLMs or self-iteration.
InternLM-Math obtains open-sourced state-of-the-art performance under the
setting of in-context learning, supervised fine-tuning, and code-assisted
reasoning in various informal and formal benchmarks including GSM8K, MATH,
Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves
30.3 on the MiniF2F test set without fine-tuning. We further explore how to use
LEAN to solve math problems and study its performance under the setting of
multi-task learning which shows the possibility of using LEAN as a unified
platform for solving and proving in math. Our models, codes, and data are
released at https://github.com/InternLM/InternLM-Math.