教科书就是你所需的一切 II:phi-1.5 技术报告
Textbooks Are All You Need II: phi-1.5 technical report
September 11, 2023
作者: Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee
cs.AI
摘要
我们继续探讨基于较小Transformer的能力,这一研究由TinyStories发起,TinyStories是一个拥有1000万参数的模型,能够生成连贯的英语。随后进行了phi-1的研究,这是一个拥有13亿参数的模型,其Python编码性能接近最先进水平。后续研究建议利用现有的大型语言模型(LLMs)生成“教科书质量”的数据,以增强学习过程,相较于传统网络数据。我们采用“只需教科书”方法,这次专注于自然语言中的常识推理,并创建了一个新的13亿参数模型,命名为phi-1.5,在自然语言任务上的表现可与大5倍的模型相媲美,并在更复杂的推理任务上超越了大多数非前沿LLMs,如小学数学和基本编码。总体而言,phi-1.5表现出许多较大LLMs的特征,包括优点,如“逐步思考”或进行一些基础的上下文学习,以及缺点,包括幻觉和潜在的有害和偏见生成,令人鼓舞的是,由于缺少网络数据,我们看到在这方面有所改善。我们开源phi-1.5,以促进对这些紧急主题的进一步研究。
English
We continue the investigation into the power of smaller Transformer-based
language models as initiated by TinyStories -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
phi-1, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named phi-1.5, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, phi-1.5 exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source phi-1.5 to promote further research on these urgent
topics.