超越楚秘拉最佳:考虑语言模型推理的扩展定律
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
December 31, 2023
作者: Nikhil Sardana, Jonathan Frankle
cs.AI
摘要
大型语言模型(LLM)的扩展规律是经验公式,用于估计随着参数数量和训练数据增加而导致的模型质量变化。然而,这些公式,包括流行的DeepMind Chinchilla扩展规律,在计算时忽略了推理成本。我们修改了Chinchilla扩展规律,以计算最佳的LLM参数数量和预训练数据大小,以训练和部署具有特定质量和推理需求的模型。我们进行了基于计算预算和实际成本的分析,并发现LLM研究人员预期有相当大的推理需求(~10亿请求)时,应该训练比Chinchilla最优模型更小更长的模型。
English
Large language model (LLM) scaling laws are empirical formulas that estimate
changes in model quality as a result of increasing parameter count and training
data. However, these formulas, including the popular DeepMind Chinchilla
scaling laws, neglect to include the cost of inference. We modify the
Chinchilla scaling laws to calculate the optimal LLM parameter count and
pre-training data size to train and deploy a model of a given quality and
inference demand. We conduct our analysis both in terms of a compute budget and
real-world costs and find that LLM researchers expecting reasonably large
inference demand (~1B requests) should train models smaller and longer than
Chinchilla-optimal.