통합 신경망 스케일링 법칙

초록

우리는 다양한 아키텍처와 다양한 업스트림 및 다운스트림 작업에 속한 각 작업에 대해, 여러 차원(즉, 모델 파라미터 수, 훈련 데이터셋 크기, 훈련 단계 수, 추론 단계 수, 연산량 및 다양한 하이퍼파라미터)이 동시에 모두 변할 때 관심 있는 평가 지표가 어떻게 변하는지 정확하게 모델링하고 외삽하는 함수 형태(통합 신경망 스케일링 법칙(UNSL)이라고 함)를 제시한다. 이 세트에는 대규모 비전, 언어, 수학 및 강화 학습이 포함된다. 다른 신경망 스케일링 함수 형태와 비교할 때, 이 함수 형태는 이 세트에서 스케일링 행동의 외삽을 상당히 더 정확하게 제공한다.

English

We present a functional form (that we refer to as a Unified Neural Scaling Law (UNSL)) that accurately models and extrapolates the scaling behaviors of deep neural networks as multiple dimensions all vary simultaneously (i.e. how the evaluation metric of interest varies as one simultaneously varies the number of model parameters, training dataset size, number of training steps, number of inference steps, amount of compute, and various hyperparameters) for various architectures and for each of various tasks within a varied set of upstream and downstream tasks. This set includes large-scale vision, language, math, and reinforcement learning. When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set.