較少安排的道路
The Road Less Scheduled
May 24, 2024
作者: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky
cs.AI
摘要
現有的學習率調度表,在不需要指定優化停止步驟 T 的情況下,其表現遠遠不及依賴於 T 的學習率調度表。我們提出了一種方法,通過完全避免使用調度表,避免了對於停止時間的需求,同時在一個廣泛的問題族中展示了與調度表相比的最先進性能,這些問題從凸問題到大規模深度學習問題不等。我們的「無調度表」方法在標準帶動項優化器上不引入額外的超參數。我們的方法是我們開發的一個新理論的直接結果,該理論統一了調度和迭代平均。我們的方法的開源實現可在以下網址找到 (https://github.com/facebookresearch/schedule_free)。
English
Existing learning rate schedules that do not require specification of the
optimization stopping step T are greatly out-performed by learning rate
schedules that depend on T. We propose an approach that avoids the need for
this stopping time by eschewing the use of schedules entirely, while exhibiting
state-of-the-art performance compared to schedules across a wide family of
problems ranging from convex problems to large-scale deep learning problems.
Our Schedule-Free approach introduces no additional hyper-parameters over
standard optimizers with momentum. Our method is a direct consequence of a new
theory we develop that unifies scheduling and iterate averaging. An open source
implementation of our method is available
(https://github.com/facebookresearch/schedule_free).Summary
AI-Generated Summary