비동질적 연합학습에서 적응적 양자화와 차등 프라이버시를 통한 프라이버시 강화 및 통신 효율성 향상

초록

연합 학습(FL)은 중앙 서버의 관리 하에 여러 장치가 기반 데이터를 공유하지 않고 협력적으로 모델을 학습하는 분산 기계 학습 방법이다. FL의 주요 과제 중 하나는 장치 간 연결 속도와 대역폭 차이로 인한 통신 병목 현상이다. 따라서 학습 과정에서 전송 데이터의 크기를 줄이는 것이 필수적이다. 또한 학습 중 모델 또는 기울기 분석을 통해 민감 정보가 노출될 위험도 존재한다. 본 연구는 프라이버시와 통신 효율성이라는 두 가지 문제를 동시에 해결하기 위해 차등 프라이버시(DP)와 적응적 양자화 방법을 결합한다. 프라이버시 보호를 위해 Laplacian 기반 DP를 사용하며, 이는 FL 연구에서 상대적으로 덜 탐구된 분야이면서 Gaussian 기반 DP보다 엄격한 프라이버시 보장을 제공한다. 라운드 기반 코사인 어닐링을 사용한 간단하고 효율적인 전역 비트 길이 스케줄러와, 데이터셋 엔트로피 분석을 통해 추정된 클라이언트 기여도에 기반하여 동적으로 적응하는 클라이언트 기반 스케줄러를 제안한다. CIFAR10, MNIST 및 의료 영상 데이터셋에 대해 다양한 클라이언트 수, 비트 길이 스케줄러, 프라이버시 예산을 적용한 비동일독립분포 데이터 환경에서 폭넓은 실험을 통해 방법을 평가한다. 결과는 제안한 적응적 양자화 방법이 32비트 부동소수점 학습 대비 MNIST에서 최대 52.64%, CIFAR10에서 45.06%, 의료 영상 데이터셋에서 31%에서 37%까지 총 통신 데이터량을 감소시키면서도 경쟁력 있는 모델 정확도를 유지하고 차등 프라이버시를 통해 강력한 프라이버시를 보장함을 보여준다.

English

Federated learning (FL) is a distributed machine learning method where multiple devices collaboratively train a model under the management of a central server without sharing underlying data. One of the key challenges of FL is the communication bottleneck caused by variations in connection speed and bandwidth across devices. Therefore, it is essential to reduce the size of transmitted data during training. Additionally, there is a potential risk of exposing sensitive information through the model or gradient analysis during training. To address both privacy and communication efficiency, we combine differential privacy (DP) and adaptive quantization methods. We use Laplacian-based DP to preserve privacy, which is relatively underexplored in FL and offers tighter privacy guarantees than Gaussian-based DP. We propose a simple and efficient global bit-length scheduler using round-based cosine annealing, along with a client-based scheduler that dynamically adapts based on client contribution estimated through dataset entropy analysis. We evaluate our approach through extensive experiments on CIFAR10, MNIST, and medical imaging datasets, using non-IID data distributions across varying client counts, bit-length schedulers, and privacy budgets. The results show that our adaptive quantization methods reduce total communicated data by up to 52.64% for MNIST, 45.06% for CIFAR10, and 31% to 37% for medical imaging datasets compared to 32-bit float training while maintaining competitive model accuracy and ensuring robust privacy through differential privacy.

비동질적 연합학습에서 적응적 양자화와 차등 프라이버시를 통한 프라이버시 강화 및 통신 효율성 향상

Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy

초록

Support