慎重な最適化プログラム: 1行のコードでトレーニングを改善する

要旨

AdamWは、トランスフォーマーの事前学習におけるデフォルトの最適化手法でした。長年にわたり、コミュニティはより速く安定した最適化手法を探し続けてきましたが、その結果は限定的でした。本研究では、PyTorchに1行の修正を提案し、任意の運動量ベースの最適化手法を「Cautious Optimizer」としてリネームします。例えば、C-AdamWやC-Lionです。理論的な結果から、この修正がAdamのHamiltonian関数を保持し、Lyapunov解析における収束保証を壊さないことが示されています。さらに、理論的な洞察により、新しい最適化手法のファミリー全体が明らかになります。その中から、最も単純なものを実験的に選択し、LlamaとMAEの事前学習において最大1.47倍の高速化を示しました。コードは以下のリンクから入手可能です：https://github.com/kyleliang919/C-Optim

English

AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to 1.47times. Code is available at https://github.com/kyleliang919/C-Optim

慎重な最適化プログラム: 1行のコードでトレーニングを改善する

Cautious Optimizers: Improving Training with One Line of Code

要旨

Support