FedSVD:面向LoRA的私有联邦学习自适应正交化方法
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA
May 19, 2025
作者: Seanie Lee, Sangwoo Park, Dong Bok Lee, Dominik Wagner, Haebin Seong, Tobias Bocklet, Juho Lee, Sung Ju Hwang
cs.AI
摘要
低秩适应(LoRA)通过在冻结的预训练权重中引入两个可训练低秩矩阵的乘积,被广泛用于联邦学习(FL)中语言模型的高效微调。然而,当与差分隐私随机梯度下降(DP-SGD)结合时,LoRA面临显著的噪声放大问题:DP-SGD扰动每个样本的梯度,而LoRA更新(BA)的矩阵乘法加剧了这一效应。冻结其中一个矩阵(如A)虽能减少噪声,但限制了模型的表达能力,往往导致次优的适应效果。为解决这一问题,我们提出了FedSVD,一种简单而有效的方法,基于奇异值分解(SVD)引入全局重参数化。在我们的方法中,每个客户端仅优化B矩阵并将其传输至服务器。服务器聚合这些B矩阵,利用先前的A计算BA乘积,并通过SVD对结果进行重构。这一过程生成一个新的自适应A,由BA的正交右奇异向量组成,以及一个包含剩余SVD分量的更新后的B。这种重参数化避免了二次噪声放大,同时使A能更好地捕捉聚合更新的主方向。此外,A的正交结构限制了B的梯度范数,并在DP-SGD下保留了更多信号,这一点已通过我们的理论分析得到证实。因此,FedSVD在各种隐私设置和基准测试中持续提升了稳定性和性能,在隐私和非隐私机制下均优于相关基线。
English
Low-Rank Adaptation (LoRA), which introduces a product of two trainable
low-rank matrices into frozen pre-trained weights, is widely used for efficient
fine-tuning of language models in federated learning (FL). However, when
combined with differentially private stochastic gradient descent (DP-SGD), LoRA
faces substantial noise amplification: DP-SGD perturbs per-sample gradients,
and the matrix multiplication of the LoRA update (BA) intensifies this
effect. Freezing one matrix (e.g., A) reduces the noise but restricts model
expressiveness, often resulting in suboptimal adaptation. To address this, we
propose FedSVD, a simple yet effective method that introduces a global
reparameterization based on singular value decomposition (SVD). In our
approach, each client optimizes only the B matrix and transmits it to the
server. The server aggregates the B matrices, computes the product BA using
the previous A, and refactorizes the result via SVD. This yields a new
adaptive A composed of the orthonormal right singular vectors of BA, and an
updated B containing the remaining SVD components. This reparameterization
avoids quadratic noise amplification, while allowing A to better capture the
principal directions of the aggregate updates. Moreover, the orthonormal
structure of A bounds the gradient norms of B and preserves more signal
under DP-SGD, as confirmed by our theoretical analysis. As a result, FedSVD
consistently improves stability and performance across a variety of privacy
settings and benchmarks, outperforming relevant baselines under both private
and non-private regimes.Summary
AI-Generated Summary