FedSVD:基于LoRA的私有联邦学习自适应正交化方法
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA
May 19, 2025
作者: Seanie Lee, Sangwoo Park, Dong Bok Lee, Dominik Wagner, Haebin Seong, Tobias Bocklet, Juho Lee, Sung Ju Hwang
cs.AI
摘要
低秩適應(LoRA)通過在凍結的預訓練權重中引入兩個可訓練的低秩矩陣的乘積,被廣泛用於聯邦學習(FL)中語言模型的高效微調。然而,當與差分隱私隨機梯度下降(DP-SGD)結合時,LoRA面臨顯著的噪聲放大問題:DP-SGD擾動每個樣本的梯度,而LoRA更新(BA)的矩陣乘法加劇了這一效應。凍結其中一個矩陣(例如A)可以減少噪聲,但限制了模型的表達能力,通常導致次優的適應。為解決這一問題,我們提出了FedSVD,這是一種簡單而有效的方法,基於奇異值分解(SVD)引入全局重參數化。在我們的方法中,每個客戶端僅優化B矩陣並將其傳輸到服務器。服務器聚合B矩陣,使用先前的A計算乘積BA,並通過SVD對結果進行重構。這產生了一個由BA的正交右奇異向量組成的新適應性A,以及包含剩餘SVD分量的更新後的B。這種重參數化避免了二次噪聲放大,同時使A能夠更好地捕捉聚合更新的主方向。此外,A的正交結構限制了B的梯度範數,並在DP-SGD下保留了更多信號,這在我們的理論分析中得到了證實。因此,FedSVD在各種隱私設置和基準測試中始終提高了穩定性和性能,在隱私和非隱私環境下均優於相關基線。
English
Low-Rank Adaptation (LoRA), which introduces a product of two trainable
low-rank matrices into frozen pre-trained weights, is widely used for efficient
fine-tuning of language models in federated learning (FL). However, when
combined with differentially private stochastic gradient descent (DP-SGD), LoRA
faces substantial noise amplification: DP-SGD perturbs per-sample gradients,
and the matrix multiplication of the LoRA update (BA) intensifies this
effect. Freezing one matrix (e.g., A) reduces the noise but restricts model
expressiveness, often resulting in suboptimal adaptation. To address this, we
propose FedSVD, a simple yet effective method that introduces a global
reparameterization based on singular value decomposition (SVD). In our
approach, each client optimizes only the B matrix and transmits it to the
server. The server aggregates the B matrices, computes the product BA using
the previous A, and refactorizes the result via SVD. This yields a new
adaptive A composed of the orthonormal right singular vectors of BA, and an
updated B containing the remaining SVD components. This reparameterization
avoids quadratic noise amplification, while allowing A to better capture the
principal directions of the aggregate updates. Moreover, the orthonormal
structure of A bounds the gradient norms of B and preserves more signal
under DP-SGD, as confirmed by our theoretical analysis. As a result, FedSVD
consistently improves stability and performance across a variety of privacy
settings and benchmarks, outperforming relevant baselines under both private
and non-private regimes.Summary
AI-Generated Summary