語言模型的演算法進展
Algorithmic progress in language models
March 9, 2024
作者: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla
cs.AI
摘要
我們研究了自深度學習出現以來,用於預訓練語言模型的演算法改善速度。利用跨越2012年至2023年的超過200個Wikitext和Penn Treebank語言模型評估數據集,我們發現達到一定性能閾值所需的計算量大約每8個月減半一次,95%的置信區間約為5至14個月,遠快於摩爾定律下的硬體增長。我們估計了擴增的縮放定律,這使我們能夠量化演算法進展並確定模型縮放與訓練演算法創新之間的相對貢獻。儘管演算法進展迅速且出現了新的架構,如Transformer,但我們的分析顯示,計算量的增加在這段時間內對整體性能改進的貢獻更大。儘管受到嘈雜的基準數據的限制,我們的分析量化了語言建模的快速進展,闡明了計算量和演算法對相對貢獻的情況。
English
We investigate the rate at which algorithms for pre-training language models
have improved since the advent of deep learning. Using a dataset of over 200
language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we
find that the compute required to reach a set performance threshold has halved
approximately every 8 months, with a 95% confidence interval of around 5 to 14
months, substantially faster than hardware gains per Moore's Law. We estimate
augmented scaling laws, which enable us to quantify algorithmic progress and
determine the relative contributions of scaling models versus innovations in
training algorithms. Despite the rapid pace of algorithmic progress and the
development of new architectures such as the transformer, our analysis reveals
that the increase in compute made an even larger contribution to overall
performance improvements over this time period. Though limited by noisy
benchmark data, our analysis quantifies the rapid progress in language
modeling, shedding light on the relative contributions from compute and
algorithms.Summary
AI-Generated Summary