Llemma：一个面向数学的开放式语言模型

摘要

我们提出了Llemma，这是一个用于数学的大型语言模型。我们继续在Proof-Pile-2上对Code Llama进行预训练，这是一个包含科学论文、包含数学内容的网络数据以及数学代码的混合物，得到了Llemma。在MATH基准测试中，Llemma的表现优于所有已知的开放基准模型，以及未发布的Minerva模型套件在等参数基础上。此外，Llemma能够进行工具使用和形式化定理证明，而无需进一步微调。我们公开发布所有工件，包括70亿和340亿参数模型、Proof-Pile-2以及用于复制我们实验的代码。

English

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Llemma：一个面向数学的开放式语言模型

Llemma: An Open Language Model For Mathematics

摘要

Support