InternLM-Math: 検証可能な推論に向けたオープンな数学大規模言語モデル

要旨

大規模言語モデルの数学能力は、その抽象的推論能力を表すことができます。本論文では、InternLM2から継続事前学習を行った数学推論LLM「InternLM-Math」を紹介し、オープンソース化します。我々は、連鎖的思考推論、報酬モデリング、形式的推論、データ拡張、コードインタプリタを統一されたseq2seq形式で統合し、モデルを多機能な数学推論者、検証者、証明者、拡張者として監督します。これらの能力は、次世代の数学LLMや自己反復の開発に活用できます。InternLM-Mathは、GSM8K、MATH、ハンガリー数学試験、MathBench-ZH、MiniF2Fなど、様々な非公式および公式のベンチマークにおいて、文脈内学習、教師ありファインチューニング、コード支援推論の設定下でオープンソースの最先端性能を達成しました。我々の事前学習モデルは、ファインチューニングなしでMiniF2Fテストセットにおいて30.3を達成しました。さらに、LEANを使用して数学問題を解決する方法を探り、マルチタスク学習の設定下での性能を調査しました。これは、LEANを数学の解決と証明のための統一プラットフォームとして使用する可能性を示しています。我々のモデル、コード、データはhttps://github.com/InternLM/InternLM-Mathで公開されています。

English

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at https://github.com/InternLM/InternLM-Math.

InternLM-Math: 検証可能な推論に向けたオープンな数学大規模言語モデル

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

要旨

Support