대규모 언어 모델 사전 학습을 위한 최적화 도구 벤치마킹

초록

대규모 언어 모델(LLMs)의 최근 발전은 딥러닝 모델의 손실을 더욱 효과적으로 최적화하기 위한 다양한 새로운 아이디어와 방법론의 활발한 등장과 함께 이루어졌다. 이러한 방법론들이 주장하는 바는 다양하며, 더 빠른 수렴 속도부터 특정 하이퍼파라미터에 대한 의존성 제거까지 포함된다. 그러나 이러한 주장을 검증하기 위해 사용된 다양한 실험 프로토콜로 인해 방법론 간의 직접적인 비교는 어려운 상황이다. 본 연구는 표준화된 LLM 사전 학습 시나리오에서 최근의 최적화 기법들을 종합적으로 평가하며, 모델 크기, 배치 크기, 학습 기간을 체계적으로 변화시켜 실험을 진행한다. 각 방법론을 신중하게 튜닝함으로써, 우리는 실무자들에게 각 시나리오에 가장 적합한 옵티마이저를 제안한다. 연구자들에게는 향후 최적화 연구를 위한 유망한 방향성을 제시한다. 마지막으로, 우리는 코드를 공개하고 모든 실험을 완전히 재현 가능하도록 함으로써, 향후 방법론의 개발과 엄격한 벤치마킹에 기여하고자 한다.

English

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.

대규모 언어 모델 사전 학습을 위한 최적화 도구 벤치마킹

Benchmarking Optimizers for Large Language Model Pretraining

초록

Support