Phi-3 技术报告:在您的手机上本地运行的高性能语言模型
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
April 22, 2024
作者: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Olatunji Ruwase, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yunan Zhang, Xiren Zhou
cs.AI
摘要
我们介绍了phi-3-mini,这是一个拥有38亿参数的语言模型,训练数据包括3.3万亿个标记。无论是从学术基准还是内部测试的结果来看,其整体性能与Mixtral 8x7B和GPT-3.5等模型不相上下(例如,phi-3-mini在MMLU上达到69%,在MT-bench上达到8.38),尽管规模小到可以部署在手机上。创新完全体现在我们用于训练的数据集上,这是phi-2使用数据的扩展版本,由经过严格筛选的网络数据和合成数据组成。该模型还进一步针对鲁棒性、安全性和聊天格式进行了优化。我们还提供了一些初始的参数缩放结果,使用了分别训练了48万亿标记的7B和14B模型,称为phi-3-small和phi-3-medium,它们比phi-3-mini具有更强的性能(例如,分别在MMLU上达到75%和78%,在MT-bench上达到8.7和8.9)。
English
We introduce phi-3-mini, a 3.8 billion parameter language model trained on
3.3 trillion tokens, whose overall performance, as measured by both academic
benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and
GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite
being small enough to be deployed on a phone. The innovation lies entirely in
our dataset for training, a scaled-up version of the one used for phi-2,
composed of heavily filtered web data and synthetic data. The model is also
further aligned for robustness, safety, and chat format. We also provide some
initial parameter-scaling results with a 7B and 14B models trained for 4.8T
tokens, called phi-3-small and phi-3-medium, both significantly more capable
than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on
MT-bench).Summary
AI-Generated Summary