친칠라 최적점을 넘어서: 언어 모델 스케일링 법칙에서 추론을 고려하기

초록

대규모 언어 모델(LLM) 스케일링 법칙은 모델의 파라미터 수와 학습 데이터를 증가시켰을 때 모델 품질의 변화를 추정하는 경험적 공식입니다. 그러나 DeepMind의 Chinchilla 스케일링 법칙을 포함한 이러한 공식들은 추론 비용을 고려하지 않습니다. 우리는 Chinchilla 스케일링 법칙을 수정하여 주어진 품질과 추론 수요를 충족하는 모델을 학습 및 배포하기 위한 최적의 LLM 파라미터 수와 사전 학습 데이터 크기를 계산합니다. 우리는 컴퓨팅 예산과 실제 비용 측면에서 분석을 수행하며, 상당히 큰 추론 수요(~10억 요청)를 예상하는 LLM 연구자들은 Chinchilla 최적보다 더 작고 더 오래 학습된 모델을 훈련해야 한다는 것을 발견했습니다.

English

Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.

친칠라 최적점을 넘어서: 언어 모델 스케일링 법칙에서 추론을 고려하기

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

초록

Support