聯邦式素描 LoRA：大型語言模型的設備內協作微調

摘要

在設備上微調大型語言模型（LLMs）正吸引越來越多的關注。最近的研究將低秩適應（LoRA）技術與聯邦微調相結合，以減輕與設備模型大小和數據稀缺相關的挑戰。然而，計算資源的異質性仍然是一個關鍵瓶頸：雖然較高秩的模塊通常會增強性能，但不同的設備能力限制了LoRA可行的秩範圍。現有的解決這個問題的方法要麼缺乏分析證明，要麼會增加額外的計算負擔，這為一個高效且理論基礎的解決方案留下了很大的空間。為應對這些挑戰，我們提出了聯邦素描LoRA（FSLoRA），它利用素描機制使設備能夠選擇性地更新由服務器維護的全局LoRA模塊的子矩陣。通過調整素描比例，這些比例確定了設備上子矩陣的秩，FSLoRA可以靈活地適應特定設備的通信和計算限制。我們提供了FSLoRA的嚴格收斂分析，該分析描述了素描比例如何影響收斂速度。通過對多個數據集和LLM模型進行全面實驗，我們展示了FSLoRA相對於各種基準的卓越性能。

English

Fine-tuning large language models (LLMs) on devices is attracting increasing interest. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with device model sizes and data scarcity. Still, the heterogeneity of computational resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying device capabilities constrain LoRA's feasible rank range. Existing approaches attempting to resolve this issue either lack analytical justification or impose additional computational overhead, leaving a wide gap for an efficient and theoretically-grounded solution. To address these challenges, we propose federated sketching LoRA (FSLoRA), which leverages a sketching mechanism to enable devices to selectively update submatrices of global LoRA modules maintained by the server. By adjusting the sketching ratios, which determine the ranks of the submatrices on the devices, FSLoRA flexibly adapts to device-specific communication and computational constraints. We provide a rigorous convergence analysis of FSLoRA that characterizes how the sketching ratios affect the convergence rate. Through comprehensive experiments on multiple datasets and LLM models, we demonstrate FSLoRA's superior performance compared to various baselines.

聯邦式素描 LoRA：大型語言模型的設備內協作微調

Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models

摘要

Support