Federated Sketching LoRA: 大規模言語モデルのデバイス内共同微調整

要旨

大規模言語モデル（LLM）のFine-tuningは、デバイス上での注目を集めています。最近の研究では、低ランク適応（LoRA）技術をフェデレーテッドFine-tuningと融合させ、デバイスモデルのサイズやデータ不足に関連する課題を緩和しています。しかし、計算リソースの異質性は依然として重要なボトルネックです。高ランクモジュールは一般的にパフォーマンスを向上させますが、異なるデバイスの能力によってLoRAの適切なランク範囲が制約されます。この問題を解決しようとする既存のアプローチは、解析的な根拠が不足しているか、追加の計算負荷を課しており、効率的かつ理論的に基づいた解決策には大きなギャップがあります。これらの課題に対処するために、私たちはフェデレーテッドスケッチングLoRA（FSLoRA）を提案します。これは、スケッチングメカニズムを活用して、サーバーが維持するグローバルLoRAモジュールのサブ行列をデバイスが選択的に更新できるようにします。デバイス固有の通信および計算上の制約に柔軟に適応するために、デバイス上のサブ行列のランクを決定するスケッチング比率を調整します。FSLoRAの収束解析を提供し、スケッチング比率が収束速度にどのように影響するかを特徴付けます。複数のデータセットとLLMモデルでの包括的な実験を通じて、さまざまなベースラインと比較してFSLoRAの優れたパフォーマンスを実証します。

English

Fine-tuning large language models (LLMs) on devices is attracting increasing interest. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with device model sizes and data scarcity. Still, the heterogeneity of computational resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying device capabilities constrain LoRA's feasible rank range. Existing approaches attempting to resolve this issue either lack analytical justification or impose additional computational overhead, leaving a wide gap for an efficient and theoretically-grounded solution. To address these challenges, we propose federated sketching LoRA (FSLoRA), which leverages a sketching mechanism to enable devices to selectively update submatrices of global LoRA modules maintained by the server. By adjusting the sketching ratios, which determine the ranks of the submatrices on the devices, FSLoRA flexibly adapts to device-specific communication and computational constraints. We provide a rigorous convergence analysis of FSLoRA that characterizes how the sketching ratios affect the convergence rate. Through comprehensive experiments on multiple datasets and LLM models, we demonstrate FSLoRA's superior performance compared to various baselines.

Federated Sketching LoRA: 大規模言語モデルのデバイス内共同微調整

Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models

要旨

Support