대규모 언어 모델을 위한 연합 프루닝 탐구

초록

LLM 프루닝은 리소스가 제한된 장치에서의 배포를 가능하게 하는, LLM을 압축하기 위한 유망한 기술로 부상했습니다. 그러나 현재의 방법론들은 일반적으로 공개 보정 샘플에 대한 접근을 필요로 하는데, 이는 프라이버시가 중요한 도메인에서는 획득하기 어려울 수 있습니다. 이 문제를 해결하기 위해, 우리는 LLM의 프라이버시 보존 압축을 위해 설계된 포괄적인 연합 프루닝 프레임워크인 FedPrLLM을 소개합니다. FedPrLLM에서 각 클라이언트는 로컬 보정 데이터를 기반으로 프루닝 마스크 행렬을 계산하고 이를 서버와 공유하여 전역 모델을 프루닝합니다. 이 접근 방식은 각 클라이언트의 지식을 활용하여 전역 모델을 협업적으로 프루닝하면서도 로컬 데이터의 프라이버시를 유지할 수 있게 합니다. 또한, 우리는 FedPrLLM 프레임워크 내에서 다양한 가능성을 탐구하기 위해 광범위한 실험을 수행했습니다. 여기에는 다른 비교 그룹, 프루닝 전략, 그리고 가중치 스케일링 여부 결정 등이 포함됩니다. 우리의 광범위한 평가 결과, 레이어 비교와 가중치 스케일링 없이 한 번에 프루닝을 수행하는 것이 FedPrLLM 프레임워크 내에서 최적의 선택임을 밝혔습니다. 우리의 연구가 프라이버시가 중요한 분야에서의 LLM 프루닝에 대한 미래의 노력을 안내하는 데 도움이 되기를 바랍니다. 우리의 코드는 https://github.com/Pengxin-Guo/FedPrLLM에서 확인할 수 있습니다.

English

LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies, and the decision to scale weights. Our extensive evaluation reveals that one-shot pruning with layer comparison and no weight scaling is the optimal choice within the FedPrLLM framework. We hope our work will help guide future efforts in pruning LLMs in privacy-sensitive fields. Our code is available at https://github.com/Pengxin-Guo/FedPrLLM.

대규모 언어 모델을 위한 연합 프루닝 탐구

Exploring Federated Pruning for Large Language Models

초록

Support