標題:Llemma:一個針對數學的開放式語言模型
Llemma: An Open Language Model For Mathematics
October 16, 2023
作者: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
cs.AI
摘要
我們提出了Llemma,這是一個針對數學領域的大型語言模型。我們繼續在Proof-Pile-2上對Code Llama進行預訓練,這是一個包含科學論文、包含數學內容的網絡數據以及數學代碼的混合資料集,最終得到了Llemma。在MATH基準測試中,Llemma表現優於所有已知的開源基礎模型,以及未發布的Minerva模型套件在等參數基礎上。此外,Llemma能夠進行工具使用和形式定理證明,而無需進行進一步的微調。我們公開釋出所有產物,包括70億和340億參數模型、Proof-Pile-2資料集以及複製我們實驗的代碼。
English
We present Llemma, a large language model for mathematics. We continue
pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web
data containing mathematics, and mathematical code, yielding Llemma. On the
MATH benchmark Llemma outperforms all known open base models, as well as the
unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is
capable of tool use and formal theorem proving without any further finetuning.
We openly release all artifacts, including 7 billion and 34 billion parameter
models, the Proof-Pile-2, and code to replicate our experiments.