教科书就是你所需的一切 II：phi-1.5 技术报告

摘要

我们继续探讨基于较小Transformer的能力，这一研究由TinyStories发起，TinyStories是一个拥有1000万参数的模型，能够生成连贯的英语。随后进行了phi-1的研究，这是一个拥有13亿参数的模型，其Python编码性能接近最先进水平。后续研究建议利用现有的大型语言模型（LLMs）生成“教科书质量”的数据，以增强学习过程，相较于传统网络数据。我们采用“只需教科书”方法，这次专注于自然语言中的常识推理，并创建了一个新的13亿参数模型，命名为phi-1.5，在自然语言任务上的表现可与大5倍的模型相媲美，并在更复杂的推理任务上超越了大多数非前沿LLMs，如小学数学和基本编码。总体而言，phi-1.5表现出许多较大LLMs的特征，包括优点，如“逐步思考”或进行一些基础的上下文学习，以及缺点，包括幻觉和潜在的有害和偏见生成，令人鼓舞的是，由于缺少网络数据，我们看到在这方面有所改善。我们开源phi-1.5，以促进对这些紧急主题的进一步研究。

English

We continue the investigation into the power of smaller Transformer-based language models as initiated by TinyStories -- a 10 million parameter model that can produce coherent English -- and the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics.

教科书就是你所需的一切 II：phi-1.5 技术报告

Textbooks Are All You Need II: phi-1.5 technical report

摘要

Support