超越極限最佳化：考慮語言模型推論的擴展定律

摘要

大型語言模型（LLM）的擴展定律是一種經驗公式，用於估計隨著參數數量和訓練數據增加而導致模型質量變化。然而，這些公式，包括流行的DeepMind Chinchilla擴展定律，在計算時忽略了推理成本。我們修改了Chinchilla擴展定律，以計算最佳的LLM參數數量和預訓練數據大小，以訓練和部署具有特定質量和推理需求的模型。我們進行了分析，既考慮計算預算，也考慮現實世界成本，發現預期有相當大的推理需求（約10億次請求）的LLM研究人員應該訓練比Chinchilla最佳模型更小且更長的模型。

English

Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.

超越極限最佳化：考慮語言模型推論的擴展定律

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

摘要

Support