宝石：多角的スケーリング則のためのモデルスイート

要旨

スケーリング則は通常、狭い範囲の固定ハイパーパラメータ選択を使用して適合されます。本研究では、幅広いアーキテクチャとハイパーパラメータ選択を使用してスケーリング則を研究し、その結果に与える影響を強調します。当研究の主要成果として、我々は「Gemstones（宝石）」を公開します。これは、史上最も包括的なオープンソースのスケーリング則データセットであり、最大20億のパラメータを持つトランスフォーマーからの4000以上のチェックポイントで構成されています。これらのモデルは異なる学習率、冷却スケジュール、およびアーキテクチャ形状で訓練されています。当チェックポイントにより、モデルの幅と深さの関数として言語モデリングのパフォーマンスを予測する法則など、より複雑なスケーリングの研究が可能となります。当モデルスイートのさまざまな側面を検討することで、スケーリング則の指針は実験設計プロセスと適合中に使用される特定のモデルチェックポイントに非常に敏感であることが分かります。コード：https://github.com/mcleish7/gemstone-scaling-laws

English

Scaling laws are typically fit using a family of models with a narrow range of frozen hyper-parameter choices. In this work we study scaling laws using a wide range of architecture and hyper-parameter choices, and highlight their impact on resulting prescriptions. As a primary artifact of our research, we release the Gemstones: the most comprehensive open-source scaling law dataset to date, consisting of over 4000 checkpoints from transformers with up to 2 billion parameters; these models have been trained with different learning rates, cooldown schedules, and architectural shapes. Our checkpoints enable more complex studies of scaling, such as a law that predicts language modeling performance as a function of model width and depth. By examining the various facets of our model suite, we find that the prescriptions of scaling laws can be highly sensitive to the experimental design process and the specific model checkpoints used during fitting. Code: https://github.com/mcleish7/gemstone-scaling-laws

宝石：多角的スケーリング則のためのモデルスイート

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

要旨

Support