教科书就是你所需要的

摘要

我们引入 phi-1，这是一个用于代码的新型大型语言模型，其尺寸明显比竞争模型小：phi-1 是一个基于 Transformer 的模型，具有 13 亿参数，在 8 个 A100 上训练了 4 天，使用了来自网络的“教科书质量”数据（60 亿标记）和使用 GPT-3.5 合成生成的教科书和练习（10 亿标记）。尽管规模较小，phi-1 在 HumanEval 上的 pass@1 准确率达到 50.6%，在 MBPP 上达到 55.5%。与 phi-1-base 相比，即我们在编程练习数据集上进行微调之前的模型，以及 phi-1-small，一个具有 3.5 亿参数的较小模型，使用与 phi-1 相同的流程训练，仍然在 HumanEval 上达到 45% 的准确率，phi-1 还展现出令人惊讶的新颖特性。

English

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

教科书就是你所需要的

Textbooks Are All You Need

摘要

Support