LoRA优化中缩放因子的隐藏力量
The Hidden Power of Scaling Factor in LoRA Optimization
June 11, 2026
作者: Zicheng Zhang, Haoran Li, Jiaxing Wang, Guoqiang Gong, Anqi Li, Yudong Hu, Ting Xiong, Yurong Gao, Junxing Hu, Zhida Jiang, Yifeng Zhang, Pengzhang Liu, Qixia Jiang
cs.AI
摘要
在低秩适应(LoRA)中,缩放因子α常被视为学习率的简单补充,但其在优化过程中的具体作用仍未被充分理解。本文揭示了缩放因子α与学习率在功能上的本质差异:α作为有效优化的核心驱动力,能够带来单纯依靠学习率缩放无法实现的效果增益。通过大规模实证分析与理论框架“信号-漂移”的协同研究,我们获得了关于LoRA缩放机制的三项发现:第一,LoRA的频谱抑制效应平滑了优化景观,使得标准超参数过于保守,从而产生优化差距;第二,在利用这种平滑性加速收敛时,α通过放大任务信号且不增加漂移比,其表现优于学习率;第三,最优缩放因子与秩之间存在次线性关系,该关系遵循平方根定律,且系数异常之大,揭示了现有秩相关启发式方法的缩放不足。基于这些发现,我们提出LoRA-α这一极简框架,将α恢复至其原理性的调控范围,使LoRA能够兼容标准小学习率。多项任务中的广泛评估表明,LoRA-α能始终如一地提升性能,同时简化超参数搜索,充分释放LoRA的学习潜力。
English
In Low-Rank Adaptation (LoRA), the scaling factor α is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor α and the learning rate function differently, with α emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, α outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-α, a minimalist framework that restores α to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-α consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.