統一神經縮放定律

摘要

我們提出了一種函數形式（稱為統一神經縮放定律，UNSL），能夠精確建模與推論深度神經網路在多個維度同時變化時的縮放行為（即當模型參數數量、訓練資料集大小、訓練步數、推理步數、計算量及各種超參數同時變動時，目標評估指標如何變化），適用於多種架構及一系列上游與下游任務，包括大規模視覺、語言、數學與強化學習。與其他神經縮放的函數形式相比，此函數形式在這些任務上所展現的縮放行為推論結果顯著更為準確。

English

We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.