통신 비용 18킬로바이트 미만으로 10억 규모 언어 모델의 연합 전체 파라미터 튜닝

초록

사전 학습된 대규모 언어 모델(LLMs)은 자연어 명령어에 대한 반응성을 개선하기 위해 미세 조정이 필요합니다. 연합 학습(Federated Learning, FL)은 데이터 프라이버시를 저해하지 않으면서도 엔드 디바이스의 풍부한 데이터를 활용해 미세 조정을 수행할 수 있는 방법을 제공합니다. 기존의 대부분의 연합 미세 조정 방법들은 매개변수 효율적 미세 조정 기법에 의존하고 있는데, 이는 전체 매개변수 조정이 가능한 성능 수준에 미치지 못할 수 있습니다. 그러나 전체 매개변수 조정과 관련된 통신 오버헤드는 서버와 클라이언트 모두에게 지나치게 높습니다. 본 연구에서는 무작위 시드 세트와 함께 제로스 오더 최적화(Zeroth-Order Optimization, ZOO)를 활용한 FedKSeed라는 새로운 접근 방식을 소개합니다. 이 방법은 기기에서 직접 십억 규모의 LLM에 대한 연합 전체 매개변수 조정을 가능하게 합니다. 우리의 방법은 서버와 클라이언트 간 전송 요구량을 몇 개의 스칼라 그래디언트와 무작위 시드로 크게 줄여, 단 몇 천 바이트에 불과하게 합니다. 이를 바탕으로, FL에서 ZOO 섭동의 중요성을 평가하는 전략을 개발하여, 모델 정확도에 더 큰 영향을 미치는 섭동을 우선적으로 샘플링할 수 있도록 합니다. 다양한 LLM, 데이터셋 및 데이터 분할을 포함한 여섯 가지 시나리오에서의 실험을 통해, 우리의 접근 방식이 통신 효율성과 새로운 작업 일반화 측면에서 기존의 연합 LLM 미세 조정 방법들을 능가함을 입증했습니다.

English

Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.

통신 비용 18킬로바이트 미만으로 10억 규모 언어 모델의 연합 전체 파라미터 튜닝

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

초록

Support