使用通信成本低于18千字节的联邦式大规模语言模型全参数调整

摘要

预训练的大型语言模型（LLMs）需要微调以提高其对自然语言指令的响应性。联邦学习（FL）提供了一种利用终端设备上丰富数据进行微调的方法，同时又不会损害数据隐私。大多数现有的用于LLMs的联邦微调方法依赖于参数高效的微调技术，这些技术可能无法达到完全参数调整可能达到的性能高度。然而，与完全参数调整相关的通信开销对服务器和客户端来说都是难以承受的。本研究介绍了FedKSeed，这是一种采用零阶优化（ZOO）与一组随机种子的新方法。它能够在设备上直接对十亿级LLMs进行联邦完全参数调整。我们的方法显著减少了服务器和客户端之间的传输需求，仅需传输少量标量梯度和随机种子，总共仅为几千字节。在此基础上，我们开发了一种评估ZOO扰动在FL中重要性的策略，允许进行概率差异化的种子抽样。这样就可以优先考虑对模型准确性影响更大的扰动。通过六种不同的LLMs、数据集和数据分区的实验，我们的方法证明在通信效率和新任务泛化方面优于现有的联邦LLM微调方法。

English

Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.

使用通信成本低于18千字节的联邦式大规模语言模型全参数调整

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

摘要

Support