步长自适应：面向预算迭代训练的统一学习率调度方案

摘要

随着计算成本的不断攀升和资源的有限性，预算迭代训练的重要性愈发凸显，其目标是在预定的迭代预算内实现最优学习。尽管学习率调度从根本上决定了不同网络和任务的表现，特别是在预算迭代场景下，其设计仍主要依赖启发式方法，缺乏理论支撑。此外，最优学习率调度需要大量的试错选择，导致训练过程效率低下。在本研究中，我们提出了统一预算感知（UBA）调度，这是一种基于理论的学习率调度方法，在不同受限训练预算下，于多种架构和任务中持续优于常用调度方案。首先，我们通过构建一个新颖的训练预算感知优化框架来弥合这一差距，该框架明确考虑了景观曲率变化的鲁棒性。基于此框架，我们推导出UBA调度，它由单一超参数φ控制，在灵活性与简洁性之间提供权衡，无需针对每个网络进行数值优化。此外，我们建立了φ与条件数之间的理论联系，为我们的方法增添了解释性和合理性。同时，我们证明了不同φ值下的收敛性，并通过理论分析和实证结果提供了选择φ的实用指南。大量实验结果表明，UBA在不同训练迭代预算下，跨越视觉和语言任务、涵盖网络架构（如ResNet、OLMo）及规模，均持续超越常用调度方案。

English

The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets.While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations.In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient.In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets.First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations.From this framework, we derive the UBA schedule, controlled by a single hyper-parameter varphi that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between varphi and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of varphi.We offer practical guidelines for its selection via theoretical analysis and empirical results.xtensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.