ChatPaper.aiChatPaper

使用通信成本低於18千字節的方式對十億規模語言模型進行聯邦式全參數調整

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

December 11, 2023
作者: Zhen Qin, Daoyuan Chen, Bingchen Qian, Bolin Ding, Yaliang Li, Shuiguang Deng
cs.AI

摘要

預訓練的大型語言模型(LLMs)需要進行微調,以提高其對自然語言指令的響應能力。聯邦學習(FL)提供了一種方法,可以利用終端設備上豐富的數據進行微調,同時不會危及數據隱私。大多數現有的用於LLMs的聯邦微調方法依賴於參數高效的微調技術,這些技術可能無法達到通過完整參數調整所能達到的性能高度。然而,與完整參數調整相關的通信開銷對於服務器和客戶端來說過高。本研究介紹了FedKSeed,一種採用零階優化(ZOO)與一組隨機種子的新方法。它使得可以在設備上直接進行十億級LLMs的聯邦完整參數微調。我們的方法顯著降低了服務器和客戶端之間的傳輸需求,僅需傳輸少量標量梯度和隨機種子,總共僅幾千字節。在此基礎上,我們開發了一種評估ZOO擾動對FL重要性的策略,允許概率差異化種子抽樣。這優先考慮對模型準確性影響更大的擾動。通過六種不同LLMs、數據集和數據分區的實驗,顯示我們的方法在通信效率和新任務泛化方面優於現有的聯邦LLM微調方法。
English
Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.
PDF71December 15, 2024