使用通信成本低于18千字节的联邦式大规模语言模型全参数调整
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
December 11, 2023
作者: Zhen Qin, Daoyuan Chen, Bingchen Qian, Bolin Ding, Yaliang Li, Shuiguang Deng
cs.AI
摘要
预训练的大型语言模型(LLMs)需要微调以提高其对自然语言指令的响应性。联邦学习(FL)提供了一种利用终端设备上丰富数据进行微调的方法,同时又不会损害数据隐私。大多数现有的用于LLMs的联邦微调方法依赖于参数高效的微调技术,这些技术可能无法达到完全参数调整可能达到的性能高度。然而,与完全参数调整相关的通信开销对服务器和客户端来说都是难以承受的。本研究介绍了FedKSeed,这是一种采用零阶优化(ZOO)与一组随机种子的新方法。它能够在设备上直接对十亿级LLMs进行联邦完全参数调整。我们的方法显著减少了服务器和客户端之间的传输需求,仅需传输少量标量梯度和随机种子,总共仅为几千字节。在此基础上,我们开发了一种评估ZOO扰动在FL中重要性的策略,允许进行概率差异化的种子抽样。这样就可以优先考虑对模型准确性影响更大的扰动。通过六种不同的LLMs、数据集和数据分区的实验,我们的方法证明在通信效率和新任务泛化方面优于现有的联邦LLM微调方法。
English
Pre-trained large language models (LLMs) require fine-tuning to improve their
responsiveness to natural language instructions. Federated learning (FL) offers
a way to perform fine-tuning using the abundant data on end devices without
compromising data privacy. Most existing federated fine-tuning methods for LLMs
rely on parameter-efficient fine-tuning techniques, which may not reach the
performance heights possible with full-parameter tuning. However, the
communication overhead associated with full-parameter tuning is prohibitively
high for both servers and clients. This work introduces FedKSeed, a novel
approach that employs zeroth-order optimization (ZOO) with a set of random
seeds. It enables federated full-parameter tuning of billion-sized LLMs
directly on devices. Our method significantly reduces transmission requirements
between the server and clients to just a few scalar gradients and random seeds,
amounting to only a few thousand bytes. Building on this, we develop a strategy
to assess the significance of ZOO perturbations for FL, allowing for
probability-differentiated seed sampling. This prioritizes perturbations that
have a greater impact on model accuracy. Experiments across six scenarios with
different LLMs, datasets and data partitions demonstrate that our approach
outperforms existing federated LLM fine-tuning methods in terms of both
communication efficiency and new task generalization.