大型語言模型預訓練中的優化器基準測試

摘要

大型語言模型（LLMs）的近期發展，伴隨著一系列新穎理念與方法的湧現，旨在更有效地優化深度學習模型的損失函數。這些方法所宣稱的優勢繁多，從加速收斂到消除對特定超參數的依賴不等。然而，由於驗證這些主張所採用的實驗方案各異，使得方法間的直接比較變得頗具挑戰。本研究針對標準化的LLM預訓練情境，對近期優化技術進行了全面評估，系統性地變更了模型規模、批次大小及訓練時長。通過對每種方法的細緻調優，我們為實踐者提供了針對不同情境下最適宜優化器的指導。對於研究人員而言，我們的工作指明了未來優化研究中的潛在方向。最後，通過公開我們的代碼並確保所有實驗的完全可重現性，我們期望這些努力能助力未來方法的開發與嚴謹基準測試。

English

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.

大型語言模型預訓練中的優化器基準測試

Benchmarking Optimizers for Large Language Model Pretraining

摘要

Support