迈向语言模型的最佳学习

摘要

本研究探讨了改进语言模型（LMs）学习的一般原则，旨在减少实现卓越性能所需的训练步骤。具体而言，我们提出了一种优化LM学习的理论。我们首先提出了一个优化LM学习的目标，即通过在“LM训练作为无损压缩”的视角中最大化数据压缩比来实现。然后，我们推导出一个定理，命名为学习定律，揭示了在我们的目标下最佳学习过程中动态特性。该定理随后通过对线性分类和真实世界语言建模任务的实验进行验证。最后，我们经验证明，LMs的最佳学习基本上源自于改进LMs缩放定律中的系数，为设计实用的学习加速方法提供了巨大的希望和重要性。我们的代码可以在https://aka.ms/LearningLaw 找到。

English

This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then, we derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective. The theorem is then validated by experiments on a linear classification and a real-world language modeling task. Finally, we empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs, indicating great promise and significance for designing practical learning acceleration methods. Our code can be found at https://aka.ms/LearningLaw.

迈向语言模型的最佳学习

Towards Optimal Learning of Language Models

摘要

Support