LoRAShear：高效的大型语言模型结构剪枝和知识恢复

摘要

大型语言模型（LLMs）已经改变了人工智能的格局，然而它们巨大的规模在计算成本方面带来了重大挑战。我们介绍了 LoRAShear，这是一种新颖的高效方法，用于在结构上修剪LLMs并恢复知识。给定一般的LLMs，LoRAShear首先创建依赖图以发现最小移除结构并分析知识分布。然后，它在LoRA适配器上进行渐进式结构修剪，并实现固有知识转移，以更好地保留冗余结构中的信息。为了在修剪过程中恢复丢失的知识，LoRAShear认真研究并提出了动态微调方案，配合动态数据适配器，有效地缩小与完整模型之间的性能差距。数值结果表明，仅使用一台GPU在几天内，LoRAShear将LLMs的占用空间有效减少了20%，仅有1.0%的性能降级，并且明显优于现有技术。源代码将在https://github.com/microsoft/lorashear 上提供。

English

Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear first creates the dependency graphs to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.

LoRAShear：高效的大型语言模型结构剪枝和知识恢复

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

摘要

Support