Llemma: 수학을 위한 오픈 언어 모델

초록

우리는 수학을 위한 대규모 언어 모델인 Llemma를 소개한다. Code Llama를 과학 논문, 수학 관련 웹 데이터, 그리고 수학적 코드로 구성된 Proof-Pile-2 데이터셋에 대해 추가 사전 학습을 진행하여 Llemma를 개발하였다. MATH 벤치마크에서 Llemma는 동일한 파라미터 규모 기준으로 알려진 모든 오픈 베이스 모델과 공개되지 않은 Minerva 모델 제품군을 능가하는 성능을 보였다. 또한 Llemma는 추가 미세 조정 없이도 도구 사용과 형식적 정리 증명이 가능하다. 우리는 70억 파라미터와 340억 파라미터 모델, Proof-Pile-2 데이터셋, 그리고 실험을 재현할 수 있는 코드를 포함한 모든 아티팩트를 공개한다.

English

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Llemma: 수학을 위한 오픈 언어 모델

Llemma: An Open Language Model For Mathematics

초록

Support