DarwinLM:大型語言模型的進化式結構化剪枝
DarwinLM: Evolutionary Structured Pruning of Large Language Models
February 11, 2025
作者: Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan Alistarh
cs.AI
摘要
大型語言模型(LLMs)在多種自然語言處理任務中取得了顯著成功。然而,其龐大的計算成本限制了其廣泛應用,尤其是在即時應用中。結構化剪枝提供了一種有效的解決方案,通過壓縮模型並直接提供端到端的速度提升,無論硬件環境如何。同時,模型的不同組件對剪枝表現出不同的敏感度,這要求進行非均勻的模型壓縮。然而,剪枝方法不僅需要識別出一個有效的子結構,還需要考慮壓縮後的訓練。為此,我們提出了\sysname,一種訓練感知的結構化剪枝方法。\sysname基於進化搜索過程,在每一代中通過變異生成多個子代模型,並選擇最適合的模型進行保留。為了評估訓練後的效果,我們在子代群體中引入了一個輕量級的多步訓練過程,逐步增加訓練數據量,並在每個選擇階段淘汰表現不佳的模型。我們在Llama-2-7B、Llama-3.1-8B和Qwen-2.5-14B-Instruct上進行了廣泛的實驗,驗證了我們的方法,並在結構化剪枝方面達到了最先進的性能。例如,\sysname在壓縮後訓練所需的數據量比ShearedLlama少5倍的情況下,仍超越了其性能。
English
Large Language Models (LLMs) have achieved significant success across various
NLP tasks. However, their massive computational costs limit their widespread
use, particularly in real-time applications. Structured pruning offers an
effective solution by compressing models and directly providing end-to-end
speed improvements, regardless of the hardware environment. Meanwhile,
different components of the model exhibit varying sensitivities towards
pruning, calling for non-uniform model compression. However, a pruning
method should not only identify a capable substructure, but also account for
post-compression training. To this end, we propose \sysname, a method for
training-aware structured pruning. \sysname builds upon an evolutionary
search process, generating multiple offspring models in each generation through
mutation, and selecting the fittest for survival. To assess the effect of
post-training, we incorporate a lightweight, multistep training process within
the offspring population, progressively increasing the number of tokens and
eliminating poorly performing models in each selection stage. We validate our
method through extensive experiments on Llama-2-7B, Llama-3.1-8B and
Qwen-2.5-14B-Instruct, achieving state-of-the-art performance for structured
pruning. For instance, \sysname surpasses ShearedLlama while requiring
5times less training data during post-compression training.Summary
AI-Generated Summary