语言模型中的算法进展

摘要

我们调查了自深度学习问世以来，用于预训练语言模型的算法改进速度。利用涵盖2012年至2023年的超过200个Wikitext和Penn Treebank语言模型评估数据集，我们发现达到一定性能阈值所需的计算量大约每8个月减半一次，95%置信区间约为5至14个月，远远快于摩尔定律下的硬件增长。我们估计了增强缩放定律，这使我们能够量化算法进展，并确定模型缩放与训练算法创新之间的相对贡献。尽管算法进展迅速，并出现了诸如Transformer等新架构，但我们的分析显示，计算量的增加在这段时间内对整体性能改进的贡献更大。尽管受到嘈杂基准数据的限制，我们的分析量化了语言建模的快速进展，阐明了计算量和算法对整体贡献的相对情况。

English

We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.

语言模型中的算法进展

Algorithmic progress in language models

摘要

Support