言語モデルの最適学習に向けて

要旨

本研究は、言語モデル（LM）の学習を改善するための一般原則を探求し、優れた性能を達成するために必要な訓練ステップ数を削減することを目的としています。具体的には、LMの最適学習に関する理論を提示します。まず、「LM訓練をロスレス圧縮として見る」という視点から、データ圧縮率を最大化することでLM学習を最適化する目的関数を提案します。次に、Learning Lawと名付けた定理を導出し、この目的関数下での最適学習プロセスにおける動的特性を明らかにします。この定理は、線形分類と実世界の言語モデリングタスクにおける実験によって検証されます。最後に、LMの最適学習が本質的にLMのスケーリング則における係数の改善に起因することを実証的に確認し、実用的な学習加速手法の設計に対する大きな可能性と重要性を示します。コードはhttps://aka.ms/LearningLawで公開されています。

English

This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then, we derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective. The theorem is then validated by experiments on a linear classification and a real-world language modeling task. Finally, we empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs, indicating great promise and significance for designing practical learning acceleration methods. Our code can be found at https://aka.ms/LearningLaw.

言語モデルの最適学習に向けて

Towards Optimal Learning of Language Models

要旨

Support