替换我：通过层剪枝与线性变换实现网络简化

摘要

我们提出了ReplaceMe，这是一种通用的免训练深度剪枝方法，它能够有效地将Transformer模块替换为线性操作，同时在低压缩比下保持高性能。与需要额外训练或微调的传统剪枝方法不同，我们的方法仅需一个小型校准数据集，用于估计一个线性变换来近似被剪枝的模块。这一估计的线性映射可以无缝地融入剩余的Transformer模块中，无需引入任何额外的网络参数。实验表明，ReplaceMe在免训练方法中持续领先，并与涉及大量重训练/微调和架构调整的最先进剪枝方法保持高度竞争力。应用于多个大型语言模型（LLMs）时，ReplaceMe实现了高达25%的剪枝率，同时在开放基准测试中保留了约90%的原模型性能——无需任何训练或修复步骤，计算开销极低（见图1）。我们提供了一个开源库，实现了ReplaceMe及多种最先进的深度剪枝技术，可在该代码库获取。

English

We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation to approximate the pruned blocks. This estimated linear mapping can be seamlessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model's performance on open benchmarks - without any training or healing steps, resulting in minimal computational overhead (see Fig.1). We provide an open-source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at this repository.

替换我：通过层剪枝与线性变换实现网络简化

ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations

摘要

Support