LoRAShear：高效的大型語言模型結構剪枝和知識恢復

摘要

大型語言模型（LLMs）已經改變了人工智能的格局，然而它們巨大的尺寸在計算成本方面帶來了重大挑戰。我們介紹了 LoRAShear，一種新穎高效的方法，用於在結構上修剪LLMs並恢復知識。給定一般的LLMs，LoRAShear首先創建依賴圖以發現最小刪除結構並分析知識分佈。然後在LoRA轉接器上進行漸進式結構修剪，並實現內在知識轉移，以更好地保留冗餘結構中的信息。為了在修剪過程中恢復丟失的知識，LoRAShear細致研究並提出了動態微調方案，並使用動態數據轉接器，以有效地縮小與完整模型之間的性能差距。數值結果表明，僅使用一個GPU，在幾天的GPU時間內，LoRAShear有效地將LLMs的佔用空間減少了20%，僅有1.0%的性能降級，並且明顯優於當前技術水平。源代碼將在 https://github.com/microsoft/lorashear 上提供。

English

Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear first creates the dependency graphs to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.

LoRAShear：高效的大型語言模型結構剪枝和知識恢復

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

摘要

Support