大规模语言模型预训练中的优化器基准测试
Benchmarking Optimizers for Large Language Model Pretraining
September 1, 2025
作者: Andrei Semenov, Matteo Pagliardini, Martin Jaggi
cs.AI
摘要
近期大型语言模型(LLMs)的发展伴随着一系列新颖思想和方法的涌现,旨在更好地优化深度学习模型的损失。这些方法提出的主张多种多样:从加速收敛到减少对特定超参数的依赖。然而,用于验证这些主张的实验协议各不相同,使得方法间的直接比较颇具挑战。本研究对近期优化技术进行了全面评估,覆盖标准化的LLM预训练场景,系统性地改变模型规模、批量大小和训练时长。通过对每种方法的细致调优,我们为实践者提供了针对不同场景选择最佳优化器的指导。对于研究人员,我们的工作指明了未来优化研究的有前景方向。最后,通过公开代码并确保所有实验完全可复现,我们希望这些努力能助力未来方法的开发与严格基准测试。
English
The recent development of Large Language Models (LLMs) has been accompanied
by an effervescence of novel ideas and methods to better optimize the loss of
deep learning models. Claims from those methods are myriad: from faster
convergence to removing reliance on certain hyperparameters. However, the
diverse experimental protocols used to validate these claims make direct
comparisons between methods challenging. This study presents a comprehensive
evaluation of recent optimization techniques across standardized LLM
pretraining scenarios, systematically varying model size, batch size, and
training duration. Through careful tuning of each method, we provide guidance
to practitioners on which optimizer is best suited for each scenario. For
researchers, our work highlights promising directions for future optimization
research. Finally, by releasing our code and making all experiments fully
reproducible, we hope our efforts can help the development and rigorous
benchmarking of future methods.