교과서만으로 충분하다

초록

우리는 코드 전용 새로운 대형 언어 모델인 phi-1을 소개합니다. 이 모델은 경쟁 모델들에 비해 상당히 작은 규모를 가지고 있습니다: phi-1은 1.3B 파라미터를 가진 Transformer 기반 모델로, 8개의 A100 GPU를 사용하여 4일 동안 학습되었으며, 웹에서 선별한 "교과서 수준" 데이터(6B 토큰)와 GPT-3.5로 생성된 합성 교과서 및 연습 문제(1B 토큰)를 사용했습니다. 이러한 작은 규모에도 불구하고, phi-1은 HumanEval에서 50.6%의 pass@1 정확도를, MBPP에서 55.5%의 정확도를 달성했습니다. 또한, 코딩 연습 문제 데이터셋에 대한 미세 조정 단계 이전의 모델인 phi-1-base와 동일한 파이프라인으로 학습된 350M 파라미터의 더 작은 모델인 phi-1-small(여전히 HumanEval에서 45% 달성)과 비교했을 때 놀라운 창발적 특성을 보여줍니다.

English

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

교과서만으로 충분하다

Textbooks Are All You Need

초록

Support