標題：Llemma：一個針對數學的開放式語言模型

摘要

我們提出了Llemma，這是一個針對數學領域的大型語言模型。我們繼續在Proof-Pile-2上對Code Llama進行預訓練，這是一個包含科學論文、包含數學內容的網絡數據以及數學代碼的混合資料集，最終得到了Llemma。在MATH基準測試中，Llemma表現優於所有已知的開源基礎模型，以及未發布的Minerva模型套件在等參數基礎上。此外，Llemma能夠進行工具使用和形式定理證明，而無需進行進一步的微調。我們公開釋出所有產物，包括70億和340億參數模型、Proof-Pile-2資料集以及複製我們實驗的代碼。

English

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

標題：Llemma：一個針對數學的開放式語言模型

Llemma: An Open Language Model For Mathematics

摘要

Support