LoRA優化中縮放因子的隱藏力量

摘要

在低秩适配（LoRA）中，缩放因子α常被视为学习率的附属补充，但其在优化过程中的具体作用尚未得到充分理解。本文揭示出，α与学习率的函数作用存在本质差异：α是有效优化的主导驱动力，其带来的性能提升无法通过单纯调整学习率复现。通过大量实证分析与理论化的“信号-漂移”框架相结合，我们发现了LoRA缩放机制的三个关键发现：第一，LoRA的频谱抑制特性平滑了优化曲面，导致标准超参数设定过于保守，形成优化缺口；第二，在利用平滑性加速收敛时，α通过增强任务信号且不提高漂移比，其表现优于学习率；第三，最优缩放因子与秩之间存在次线性关系，可由平方根律精确刻画（系数异常之大），揭示了现有秩关联启发式方法的缩放不足。基于这些洞见，我们提出LoRA-α这一极简框架，将α恢复至其原则性范畴，使LoRA能够兼容标准小学习率。跨多样化任务的广泛评估表明，LoRA-α在简化超参数搜索的同时持续提升性能，充分释放LoRA的学习潜能。

English

In Low-Rank Adaptation (LoRA), the scaling factor α is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor α and the learning rate function differently, with α emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, α outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-α, a minimalist framework that restores α to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-α consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.