관측적 스케일링 법칙과 언어 모델 성능의 예측 가능성

초록

언어 모델의 성능이 규모에 따라 어떻게 변하는지 이해하는 것은 벤치마크 및 알고리즘 개발에 있어 매우 중요합니다. 스케일링 법칙(Scaling Laws)은 이러한 이해를 구축하는 한 가지 접근 방식이지만, 다양한 규모의 모델을 학습시켜야 한다는 요구 사항으로 인해 그 활용이 제한적이었습니다. 우리는 모델 학습을 우회하고 대신 공개적으로 이용 가능한 약 80개의 모델로부터 스케일링 법칙을 구축하는 관찰적 접근 방식을 제안합니다. 여러 모델 패밀리로부터 단일 스케일링 법칙을 구축하는 것은 학습 계산 효율성과 능력에서 큰 변동이 있기 때문에 어려운 과제입니다. 그러나 우리는 이러한 변동이 단순하고 일반화된 스케일링 법칙과 일관성이 있음을 보여줍니다. 이 법칙에서는 언어 모델 성능이 저차원 능력 공간의 함수이며, 모델 패밀리는 학습 계산을 능력으로 전환하는 효율성에서만 차이가 납니다. 이 접근 방식을 사용하여 우리는 복잡한 스케일링 현상의 놀라운 예측 가능성을 보여줍니다: 여러 가지 돌출 현상이 부드러운 시그모이드(Sigmoidal) 행동을 따르며 작은 모델로부터 예측 가능함을 보여주고, GPT-4와 같은 모델의 에이전트 성능이 더 단순한 비에이전트 벤치마크로부터 정확하게 예측될 수 있음을 보여주며, 언어 모델의 능력이 계속 향상됨에 따라 Chain-of-Thought 및 Self-Consistency와 같은 사후 학습 개입의 영향을 예측하는 방법을 보여줍니다.

English

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.

관측적 스케일링 법칙과 언어 모델 성능의 예측 가능성

Observational Scaling Laws and the Predictability of Language Model Performance

초록

Support