少量调度的道路
The Road Less Scheduled
May 24, 2024
作者: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky
cs.AI
摘要
现有的学习率调度,不需要指定优化停止步骤T,远不及依赖于T的学习率调度。我们提出了一种方法,通过完全避免使用调度,避免了对这个停止时间的需求,同时在一系列问题上展现出与调度相比的最先进性能,这些问题从凸问题到大规模深度学习问题不等。我们的无调度方法在标准带动量优化器上没有额外的超参数。我们的方法是我们开发的一个新理论的直接结果,该理论统一了调度和迭代平均。我们的方法的开源实现可在以下链接找到(https://github.com/facebookresearch/schedule_free)。
English
Existing learning rate schedules that do not require specification of the
optimization stopping step T are greatly out-performed by learning rate
schedules that depend on T. We propose an approach that avoids the need for
this stopping time by eschewing the use of schedules entirely, while exhibiting
state-of-the-art performance compared to schedules across a wide family of
problems ranging from convex problems to large-scale deep learning problems.
Our Schedule-Free approach introduces no additional hyper-parameters over
standard optimizers with momentum. Our method is a direct consequence of a new
theory we develop that unifies scheduling and iterate averaging. An open source
implementation of our method is available
(https://github.com/facebookresearch/schedule_free).Summary
AI-Generated Summary