只需教科書

摘要

我們介紹了 phi-1，一個針對程式碼的新型大型語言模型，其尺寸明顯比競爭模型小：phi-1 是基於 Transformer 的模型，具有 13 億個參數，在 8 台 A100 上訓練了 4 天，使用了從網路中選取的「教科書質量」數據（60 億標記）以及與 GPT-3.5（10 億標記）合成生成的教科書和練習題。儘管規模較小，phi-1 在 HumanEval 上達到了 50.6% 的 pass@1 準確率，並在 MBPP 上達到了 55.5%。與 phi-1-base 相比，即我們在編程練習數據集上進行微調之前的模型，以及 phi-1-small，一個具有 3.5 億參數的較小模型，使用與 phi-1 相同的流程訓練，仍然在 HumanEval 上達到了 45% 的準確率，phi-1 顯示出了一些令人驚訝的新特性。

English

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

只需教科書

Textbooks Are All You Need

摘要

Support