LoRAShear:高效的大型语言模型结构剪枝和知识恢复
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
October 24, 2023
作者: Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang
cs.AI
摘要
大型语言模型(LLMs)已经改变了人工智能的格局,然而它们巨大的规模在计算成本方面带来了重大挑战。我们介绍了 LoRAShear,这是一种新颖的高效方法,用于在结构上修剪LLMs并恢复知识。给定一般的LLMs,LoRAShear首先创建依赖图以发现最小移除结构并分析知识分布。然后,它在LoRA适配器上进行渐进式结构修剪,并实现固有知识转移,以更好地保留冗余结构中的信息。为了在修剪过程中恢复丢失的知识,LoRAShear认真研究并提出了动态微调方案,配合动态数据适配器,有效地缩小与完整模型之间的性能差距。数值结果表明,仅使用一台GPU在几天内,LoRAShear将LLMs的占用空间有效减少了20%,仅有1.0%的性能降级,并且明显优于现有技术。源代码将在https://github.com/microsoft/lorashear 上提供。
English
Large Language Models (LLMs) have transformed the landscape of artificial
intelligence, while their enormous size presents significant challenges in
terms of computational costs. We introduce LoRAShear, a novel efficient
approach to structurally prune LLMs and recover knowledge. Given general LLMs,
LoRAShear first creates the dependency graphs to discover minimally removal
structures and analyze the knowledge distribution. It then proceeds progressive
structured pruning on LoRA adaptors and enables inherent knowledge transfer to
better preserve the information in the redundant structures. To recover the
lost knowledge during pruning, LoRAShear meticulously studies and proposes a
dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow
down the performance gap to the full models. Numerical results demonstrate that
by only using one GPU within a couple of GPU days, LoRAShear effectively
reduced footprint of LLMs by 20% with only 1.0% performance degradation and
significantly outperforms state-of-the-arts. The source code will be available
at https://github.com/microsoft/lorashear.