教科書こそがすべてである

要旨

我々はphi-1を紹介する。これはコード用の新しい大規模言語モデルであり、競合モデルと比べて大幅に小さいサイズを特徴とする。phi-1は1.3BパラメータのTransformerベースのモデルで、8台のA100を使用して4日間トレーニングされ、ウェブから選別された「教科書品質」のデータ（6Bトークン）とGPT-3.5で生成された合成教科書および演習問題（1Bトークン）を使用している。この小規模にもかかわらず、phi-1はHumanEvalで50.6%、MBPPで55.5%のpass@1精度を達成する。また、phi-1は、コーディング演習データセットでのファインチューニング前のモデルであるphi-1-baseや、phi-1と同じパイプラインでトレーニングされた350Mパラメータのより小さいモデルであるphi-1-small（HumanEvalで45%を達成）と比較して、驚くべき創発的特性を示す。

English

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

教科書こそがすべてである

Textbooks Are All You Need

要旨

Support