InternLM-Math：開放式數學大型語言模型朝向可驗證推理

摘要

大型語言模型的數學能力可以代表其抽象推理能力。在本文中，我們介紹並開源我們的數學推理LLMs InternLM-Math，該模型是從InternLM2繼續預訓練而來。我們將思維鏈推理、獎勵建模、形式推理、數據增強和代碼解釋器統一在一個統一的seq2seq格式中，並監督我們的模型成為一個多才多藝的數學推理者、驗證者、證明者和增強者。這些能力可以用於開發下一代數學LLMs或自我迭代。InternLM-Math在上下文學習、監督微調和代碼輔助推理的情況下，在各種非正式和正式基準測試中（包括GSM8K、MATH、匈牙利數學考試、MathBench-ZH和MiniF2F）獲得了開源的最先進性能。我們的預訓練模型在未進行微調的情況下在MiniF2F測試集上達到了30.3的分數。我們進一步探索了如何使用LEAN來解決數學問題，並研究了在多任務學習情況下的性能，顯示了使用LEAN作為解決和證明數學問題的統一平台的可能性。我們的模型、代碼和數據已在https://github.com/InternLM/InternLM-Math 上發布。

English

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at https://github.com/InternLM/InternLM-Math.

InternLM-Math：開放式數學大型語言模型朝向可驗證推理

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

摘要

Support