チンチラ最適を超えて：言語モデルのスケーリング則における推論の考慮

要旨

大規模言語モデル（LLM）のスケーリング則は、パラメータ数とトレーニングデータの増加に伴うモデル品質の変化を推定する経験則です。しかし、これらの式（人気のあるDeepMindのChinchillaスケーリング則を含む）は、推論コストを考慮していません。我々は、Chinchillaスケーリング則を修正し、所与の品質と推論需要を持つモデルをトレーニングおよびデプロイするための最適なLLMパラメータ数と事前トレーニングデータサイズを計算します。我々の分析は、計算予算と実世界のコストの両方の観点から行い、適度に大きな推論需要（約10億リクエスト）を予想するLLM研究者は、Chinchilla最適よりも小さく長くトレーニングするべきであることを見出しました。

English

Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.

チンチラ最適を超えて：言語モデルのスケーリング則における推論の考慮

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

要旨

Support