超越楚秘拉最佳：考虑语言模型推理的扩展定律

摘要

大型语言模型（LLM）的扩展规律是经验公式，用于估计随着参数数量和训练数据增加而导致的模型质量变化。然而，这些公式，包括流行的DeepMind Chinchilla扩展规律，在计算时忽略了推理成本。我们修改了Chinchilla扩展规律，以计算最佳的LLM参数数量和预训练数据大小，以训练和部署具有特定质量和推理需求的模型。我们进行了基于计算预算和实际成本的分析，并发现LLM研究人员预期有相当大的推理需求（~10亿请求）时，应该训练比Chinchilla最优模型更小更长的模型。

English

Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.

超越楚秘拉最佳：考虑语言模型推理的扩展定律

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

摘要

Support