言語モデルにおけるアルゴリズムの進展

要旨

深層学習の登場以降、言語モデルの事前学習アルゴリズムがどのような速度で進化してきたかを調査した。2012年から2023年にかけてWikitextとPenn Treebankで実施された200以上の言語モデル評価データセットを用いて分析した結果、一定の性能閾値に到達するために必要な計算量が約8ヶ月ごとに半減していることがわかった。95%信頼区間は約5ヶ月から14ヶ月で、ムーアの法則に基づくハードウェアの進化よりも大幅に速いペースである。拡張スケーリング則を推定し、アルゴリズムの進歩を定量化するとともに、モデルのスケーリングと学習アルゴリズムの革新の相対的寄与を明らかにした。トランスフォーマーなどの新しいアーキテクチャの開発やアルゴリズムの急速な進歩にもかかわらず、この期間における全体的な性能向上への寄与は、計算量の増加がさらに大きいことが分析から明らかになった。ベンチマークデータのノイズに制約はあるものの、本分析は言語モデリングの急速な進歩を定量化し、計算量とアルゴリズムの相対的寄与に光を当てている。

English

We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.

言語モデルにおけるアルゴリズムの進展

Algorithmic progress in language models

要旨

Support