언어 모델의 최적 학습을 향하여

초록

본 연구는 언어 모델(Language Models, LMs)의 학습을 개선하는 일반적인 원리를 탐구하며, 이를 통해 우수한 성능을 달성하기 위해 필요한 학습 단계를 줄이는 것을 목표로 합니다. 구체적으로, 우리는 언어 모델의 최적 학습을 위한 이론을 제시합니다. 먼저, "손실 없는 압축으로서의 언어 모델 학습"이라는 관점에서 데이터 압축 비율을 최대화함으로써 언어 모델 학습을 최적화하는 목적 함수를 제안합니다. 그런 다음, 우리의 목적 함수 하에서 최적 학습 과정의 동역학적 특성을 밝히는 '학습 법칙(Learning Law)'이라는 정리를 유도합니다. 이 정리는 선형 분류 작업과 실제 언어 모델링 작업에 대한 실험을 통해 검증됩니다. 마지막으로, 언어 모델의 최적 학습이 근본적으로 스케일링 법칙(Scaling Law)의 계수 개선에서 비롯됨을 실증적으로 확인하며, 이는 실용적인 학습 가속 방법 설계에 있어 큰 가능성과 중요성을 시사합니다. 우리의 코드는 https://aka.ms/LearningLaw에서 확인할 수 있습니다.

English

This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then, we derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective. The theorem is then validated by experiments on a linear classification and a real-world language modeling task. Finally, we empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs, indicating great promise and significance for designing practical learning acceleration methods. Our code can be found at https://aka.ms/LearningLaw.

언어 모델의 최적 학습을 향하여

Towards Optimal Learning of Language Models

초록

Support