統一ニューラルスケーリング則

要旨

本論文では、深層ニューラルネットワークのスケーリング挙動を、複数の次元（すなわち、モデルパラメータ数、訓練データセットサイズ、訓練ステップ数、推論ステップ数、計算量、および様々なハイパーパラメータ）がすべて同時に変化する状況において（つまり、興味のある評価指標がどのように変化するか）、様々なアーキテクチャと、多様な上流および下流タスクのそれぞれについて、正確にモデル化し外挿する関数形式（これを統一神経スケーリング則（UNSL）と呼ぶ）を提示する。このタスクセットには、大規模な視覚、言語、数学、および強化学習が含まれる。他の神経スケーリングの関数形式と比較すると、この関数形式は、このセットにおけるスケーリング挙動の外挿をかなり正確にもたらす。

English

We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.