LoRA 최적화에서 스케일링 팩터의 숨겨진 힘

초록

저차원 적응(LoRA)에서 스케일링 인자 α는 종종 학습률에 대한 단순한 보조 수단으로 간주되지만, 최적화에서의 역할은 여전히 제대로 이해되지 않고 있다. 본 논문에서는 스케일링 인자 α와 학습률이 서로 다른 기능을 수행하며, α가 효과적 최적화의 주요 동인으로 부상하여 학습률 스케일링만으로는 복제할 수 없는 이득을 제공한다는 점을 밝힌다. 광범위한 실증 분석과 이론적 신호-드리프트 프레임워크의 시너지를 통해 LoRA의 스케일링 메커니즘에 대한 세 가지 발견을 제시한다. 첫째, LoRA의 스펙트럼 억제는 최적화 지형을 평탄화하여 표준 하이퍼파라미터를 지나치게 보수적으로 만들고 최적화 격차를 발생시킨다. 둘째, 이러한 평탄성을 활용하여 수렴을 가속화할 때 α는 드리프트 비율을 증가시키지 않으면서 작업 신호를 증폭하여 학습률보다 우수한 성능을 보인다. 셋째, 최적 스케일링 인자는 랭크와 서브리니어 관계를 가지며, 예상보다 큰 계수를 가진 제곱근 법칙으로 잘 특성화되어, 기존의 랭크 기반 휴리스틱의 스케일링 부족을 드러낸다. 이러한 통찰을 바탕으로 α를 원칙적 체계로 복원하여 LoRA를 표준적인 작은 학습률과 호환되게 하는 미니멀리스트 프레임워크인 LoRA-α를 제안한다. 다양한 작업에 걸친 광범위한 평가는 LoRA-α가 하이퍼파라미터 탐색을 간소화하면서 일관되게 성능을 향상시켜 LoRA의 학습 잠재력을 극대화함을 보여준다.

English

In Low-Rank Adaptation (LoRA), the scaling factor α is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor α and the learning rate function differently, with α emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, α outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-α, a minimalist framework that restores α to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-α consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.