通過層級剪枝與線性變換實現網絡簡化

摘要

我們介紹了ReplaceMe，這是一種通用的免訓練深度剪枝方法，能夠有效地將Transformer模塊替換為線性操作，同時在低壓縮比下保持高性能。與傳統需要額外訓練或微調的剪枝方法不同，我們的方法僅需一個小型校準數據集，用於估計線性變換以近似被剪枝的模塊。這一估計的線性映射可以無縫地與剩餘的Transformer模塊合併，無需任何額外的網絡參數。我們的實驗表明，ReplaceMe在免訓練方法中始終表現優異，並與涉及大量重新訓練/微調及架構修改的頂尖剪枝方法保持高度競爭力。應用於多個大型語言模型（LLMs）時，ReplaceMe實現了高達25%的剪枝，同時在公開基準測試中保留了約90%的原始模型性能——無需任何訓練或修復步驟，從而實現了最小的計算開銷（見圖1）。我們提供了一個開源庫，實現了ReplaceMe以及多種頂尖的深度剪枝技術，可在該存儲庫中獲取。

English

We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation to approximate the pruned blocks. This estimated linear mapping can be seamlessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model's performance on open benchmarks - without any training or healing steps, resulting in minimal computational overhead (see Fig.1). We provide an open-source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at this repository.

通過層級剪枝與線性變換實現網絡簡化

ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations

摘要

Support