大规模语言模型预训练中的优化器基准测试

摘要

近期大型语言模型（LLMs）的发展伴随着一系列新颖思想和方法的涌现，旨在更好地优化深度学习模型的损失。这些方法提出的主张多种多样：从加速收敛到减少对特定超参数的依赖。然而，用于验证这些主张的实验协议各不相同，使得方法间的直接比较颇具挑战。本研究对近期优化技术进行了全面评估，覆盖标准化的LLM预训练场景，系统性地改变模型规模、批量大小和训练时长。通过对每种方法的细致调优，我们为实践者提供了针对不同场景选择最佳优化器的指导。对于研究人员，我们的工作指明了未来优化研究的有前景方向。最后，通过公开代码并确保所有实验完全可复现，我们希望这些努力能助力未来方法的开发与严格基准测试。

English

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.

大规模语言模型预训练中的优化器基准测试

Benchmarking Optimizers for Large Language Model Pretraining

摘要

Support