InternLM-Math：面向可验证推理的开放数学大语言模型

摘要

大型语言模型的数学能力可以代表它们的抽象推理能力。本文介绍并开源了我们的数学推理LLMs InternLM-Math，它是从InternLM2继续预训练的。我们将思维链推理、奖励建模、形式推理、数据增强和代码解释器统一到一个统一的seq2seq格式中，并监督我们的模型成为多功能数学推理者、验证者、证明者和增强者。这些能力可用于开发下一代数学LLMs或自我迭代。InternLM-Math在上下文学习、监督微调和代码辅助推理等各种非正式和正式基准测试中获得了开源的最先进性能，包括GSM8K、MATH、匈牙利数学考试、MathBench-ZH和MiniF2F。我们的预训练模型在MiniF2F测试集上实现了30.3的成绩，无需微调。我们进一步探讨了如何使用LEAN解决数学问题，并研究了在多任务学习设置下的性能，显示了使用LEAN作为数学解决和证明的统一平台的可能性。我们的模型、代码和数据已发布在https://github.com/InternLM/InternLM-Math。

English

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at https://github.com/InternLM/InternLM-Math.

InternLM-Math：面向可验证推理的开放数学大语言模型

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

摘要

Support