FedSVD：LoRAを用いたプライベートフェデレーテッドラーニングのための適応的正交化

要旨

低ランク適応（LoRA）は、凍結された事前学習済みの重みに2つの学習可能な低ランク行列の積を導入することで、連合学習（FL）における言語モデルの効率的なファインチューニングに広く使用されています。しかし、差分プライバシー付き確率的勾配降下法（DP-SGD）と組み合わせると、LoRAは大きなノイズ増幅に直面します。DP-SGDはサンプルごとの勾配を摂動させ、LoRAの更新行列（BA）の行列乗算がこの効果を増幅します。一方の行列（例えばA）を凍結するとノイズは減少しますが、モデルの表現力が制限され、しばしば最適でない適応が生じます。この問題に対処するため、我々は特異値分解（SVD）に基づくグローバルな再パラメータ化を導入するシンプルかつ効果的な手法であるFedSVDを提案します。本手法では、各クライアントはB行列のみを最適化し、それをサーバーに送信します。サーバーはB行列を集約し、前回のAを用いて積BAを計算し、その結果をSVDを用いて再分解します。これにより、BAの正規直交右特異ベクトルからなる新しい適応行列Aと、残りのSVD成分を含む更新されたB行列が得られます。この再パラメータ化により、二次的なノイズ増幅を回避しつつ、Aが集約された更新の主方向をより良く捉えることが可能になります。さらに、Aの正規直交構造はBの勾配ノルムを制限し、DP-SGD下でより多くの信号を保持します。これは我々の理論的解析によって確認されています。その結果、FedSVDは様々なプライバシー設定とベンチマークにおいて安定性と性能を一貫して向上させ、プライベートおよび非プライベートの両方の体制下で関連するベースラインを上回る性能を示します。

English

Low-Rank Adaptation (LoRA), which introduces a product of two trainable low-rank matrices into frozen pre-trained weights, is widely used for efficient fine-tuning of language models in federated learning (FL). However, when combined with differentially private stochastic gradient descent (DP-SGD), LoRA faces substantial noise amplification: DP-SGD perturbs per-sample gradients, and the matrix multiplication of the LoRA update (BA) intensifies this effect. Freezing one matrix (e.g., A) reduces the noise but restricts model expressiveness, often resulting in suboptimal adaptation. To address this, we propose FedSVD, a simple yet effective method that introduces a global reparameterization based on singular value decomposition (SVD). In our approach, each client optimizes only the B matrix and transmits it to the server. The server aggregates the B matrices, computes the product BA using the previous A, and refactorizes the result via SVD. This yields a new adaptive A composed of the orthonormal right singular vectors of BA, and an updated B containing the remaining SVD components. This reparameterization avoids quadratic noise amplification, while allowing A to better capture the principal directions of the aggregate updates. Moreover, the orthonormal structure of A bounds the gradient norms of B and preserves more signal under DP-SGD, as confirmed by our theoretical analysis. As a result, FedSVD consistently improves stability and performance across a variety of privacy settings and benchmarks, outperforming relevant baselines under both private and non-private regimes.

FedSVD：LoRAを用いたプライベートフェデレーテッドラーニングのための適応的正交化

FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA

要旨

Support