大規模言語モデルの事前学習における最適化手法のベンチマーキング

要旨

大規模言語モデル（LLMs）の最近の発展に伴い、深層学習モデルの損失をより最適化するための新たなアイデアや手法が活発に提案されている。これらの手法が主張する利点は多岐にわたり、収束の高速化から特定のハイパーパラメータへの依存の排除まで様々である。しかし、これらの主張を検証するために用いられる実験プロトコルが多様であるため、手法間の直接的な比較は困難である。本研究では、標準化されたLLM事前学習シナリオにおいて、モデルサイズ、バッチサイズ、および学習期間を系統的に変化させながら、最近の最適化手法を包括的に評価する。各手法を慎重にチューニングすることで、各シナリオに最適なオプティマイザを実践者に示す。研究者にとっては、今後の最適化研究の有望な方向性を明らかにする。最後に、コードを公開し、すべての実験を完全に再現可能にすることで、今後の手法の開発と厳密なベンチマークに貢献することを目指す。

English

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.

大規模言語モデルの事前学習における最適化手法のベンチマーキング

Benchmarking Optimizers for Large Language Model Pretraining

要旨

Support