LLMのDensing法則

要旨

大規模言語モデル（LLM）は、人工知能における画期的な進展として登場し、その性能はモデルのサイズが増加するにつれて向上する可能性があります。ただし、このスケーリングには、特にリソースに制約のある環境でLLMを展開する際に、トレーニングおよび推論の効率に大きな課題が生じます。そして、このスケーリングの傾向はますます持続不可能になっています。本論文では、「容量密度」という概念を導入し、異なるスケールでのLLMの品質を評価する新しい尺度として紹介し、LLMの傾向を効果と効率の両面で記述します。特定の対象LLMの容量密度を計算するために、まず一連の基準モデルを導入し、これらの基準モデルのパラメータサイズに基づいて下流のパフォーマンスを予測するスケーリング則を開発します。次に、対象LLMの有効パラメータサイズを、同等のパフォーマンスを達成するために基準モデルが必要とするパラメータサイズと定義し、容量密度を対象LLMの実際のパラメータサイズに対する有効パラメータサイズの比率として形式化します。容量密度は、モデルの効果と効率の両方を評価するための統一された枠組みを提供します。最近のオープンソースの基本LLMに関するさらなる分析により、LLMの容量密度が指数関数的に成長する経験則（密度則）が明らかになりました。具体的には、一部の広く使用されているベンチマークを使用して評価すると、LLMの容量密度は約3ヶ月ごとに倍増します。この法則は、将来のLLMの開発を指針とする新しい視点を提供し、最適な結果を最小限の計算オーバーヘッドで達成するために容量密度の向上の重要性を強調しています。

English

Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``capacity density'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the effective parameter size of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.

LLMのDensing法則

Densing Law of LLMs

要旨

Support