教科書就是你所需的一切 II：phi-1.5 技術報告

摘要

我們繼續探討基於Transformer的較小語言模型的能力，這是由TinyStories啟動的 - 一個能夠生成連貫英語的1,000萬參數模型，以及對phi-1的後續工作，這是一個擁有13億參數的模型，其Python編碼性能接近最先進水平。後者的工作提出利用現有的大型語言模型（LLMs）生成“教科書質量”的數據，以增強學習過程，相較於傳統的網絡數據。我們採用“只需教科書”方法，這次專注於自然語言中的常識推理，並創建了一個新的13億參數模型，名為phi-1.5，其在自然語言任務上的性能可與大5倍的模型相媲美，並且在較複雜的推理任務上（如小學數學和基本編碼）超越了大多數非前沿的LLMs。更廣泛地說，phi-1.5展現了許多較大LLMs的特徵，包括優點 - 如能夠“逐步思考”或執行一些基本的上下文學習 - 以及缺點，包括幻覺和產生有毒和偏見的可能性 - 令人鼓舞的是，由於缺乏網絡數據，我們正在看到在這方面的改進。我們將phi-1.5開源，以促進進一步研究這些迫切的話題。

English

We continue the investigation into the power of smaller Transformer-based language models as initiated by TinyStories -- a 10 million parameter model that can produce coherent English -- and the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics.

教科書就是你所需的一切 II：phi-1.5 技術報告

Textbooks Are All You Need II: phi-1.5 technical report

摘要

Support