探索大規模語言模型的聯邦剪枝技術

摘要

LLM剪枝技術已成為壓縮大型語言模型的一種前景廣闊的方法，使其能夠部署在資源受限的設備上。然而，現有方法通常需要訪問公共校準樣本，這在注重隱私的領域中可能難以獲取。為解決這一問題，我們引入了FedPrLLM，這是一個全面的聯邦剪枝框架，專為保護隱私的LLM壓縮而設計。在FedPrLLM中，每個客戶端只需基於其本地校準數據計算剪枝掩碼矩陣，並將其分享給服務器以剪枝全局模型。這種方法允許在保持本地數據隱私的同時，利用每個客戶端的知識協同剪枝全局模型。此外，我們進行了廣泛的實驗，探索FedPrLLM框架內的各種可能性，包括不同的對照組、剪枝策略以及是否進行權重縮放的決策。我們的大量評估表明，在FedPrLLM框架內，採用層間比較且不進行權重縮放的一次性剪枝是最佳選擇。我們希望這項工作能為未來在注重隱私的領域中進行LLM剪枝的研究提供指導。我們的代碼可在https://github.com/Pengxin-Guo/FedPrLLM獲取。

English

LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies, and the decision to scale weights. Our extensive evaluation reveals that one-shot pruning with layer comparison and no weight scaling is the optimal choice within the FedPrLLM framework. We hope our work will help guide future efforts in pruning LLMs in privacy-sensitive fields. Our code is available at https://github.com/Pengxin-Guo/FedPrLLM.

探索大規模語言模型的聯邦剪枝技術

Exploring Federated Pruning for Large Language Models

摘要

Support