InternLM-Math: 검증 가능한 추론을 향한 오픈 수학 대형 언어 모델

초록

대규모 언어 모델의 수학 능력은 그들의 추상적 사고 능력을 대표할 수 있습니다. 본 논문에서는 InternLM2를 기반으로 계속 사전 학습된 수학 추론 LLM인 InternLM-Math를 소개하고 오픈소스로 공개합니다. 우리는 사고 사슬 추론, 보상 모델링, 형식적 추론, 데이터 증강, 코드 인터프리터를 통합된 seq2seq 형식으로 통합하고, 모델이 다재다능한 수학 추론자, 검증자, 증명자, 증강자로 학습되도록 지도합니다. 이러한 능력은 다음 세대 수학 LLM 개발이나 자기 반복에 활용될 수 있습니다. InternLM-Math는 GSM8K, MATH, 헝가리 수학 시험, MathBench-ZH, MiniF2F 등 다양한 비형식적 및 형식적 벤치마크에서 컨텍스트 내 학습, 지도 미세 조정, 코드 지원 추론 설정 하에서 오픈소스 최신 성능을 달성합니다. 우리의 사전 학습 모델은 미세 조정 없이 MiniF2F 테스트 세트에서 30.3점을 기록합니다. 또한, LEAN을 사용하여 수학 문제를 해결하는 방법을 탐구하고, 다중 작업 학습 설정에서의 성능을 연구함으로써 LEAN이 수학 문제 해결 및 증명을 위한 통합 플랫폼으로 사용될 가능성을 보여줍니다. 우리의 모델, 코드, 데이터는 https://github.com/InternLM/InternLM-Math에서 공개되었습니다.

English

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at https://github.com/InternLM/InternLM-Math.

InternLM-Math: 검증 가능한 추론을 향한 오픈 수학 대형 언어 모델

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

초록

Support