统一神经缩放定律

摘要

我们提出了一种函数形式（称为统一神经缩放定律，简称UNSL），它能够精确建模并外推深度神经网络在多个维度同时变化时的缩放行为（即目标评估指标如何随模型参数量、训练数据集规模、训练步数、推理步数、计算量以及各种超参数的同时变化而变化），适用于多种架构及多样化的上游与下游任务。该任务集涵盖大规模视觉、语言、数学及强化学习领域。与其他神经缩放函数形式相比，该函数形式在此任务集上对缩放行为的外推精度显著更优。

English

We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.