邁向語言模型的最佳學習

摘要

本研究探討改善語言模型（LMs）學習的一般原則，旨在減少實現優越性能所需的訓練步驟。具體而言，我們提出了一個關於LMs最佳學習的理論。我們首先提出一個目標，通過在“LM訓練作為無損壓縮”的觀點下最大化數據壓縮比來優化LM學習。然後，我們推導出一個定理，名為學習定律，揭示了在我們的目標下最佳學習過程中動態性質。該定理隨後通過對線性分類和現實世界語言建模任務的實驗進行驗證。最後，我們在實證上證實，LMs的最佳學習基本上源於改善LMs的比例定律中的係數，這對設計實用的學習加速方法具有重要潛力和意義。我們的程式碼可在https://aka.ms/LearningLaw 找到。

English

This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then, we derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective. The theorem is then validated by experiments on a linear classification and a real-world language modeling task. Finally, we empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs, indicating great promise and significance for designing practical learning acceleration methods. Our code can be found at https://aka.ms/LearningLaw.

邁向語言模型的最佳學習

Towards Optimal Learning of Language Models

摘要

Support