LLM 外科醫師
The LLM Surgeon
December 28, 2023
作者: Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort
cs.AI
摘要
最先進的語言模型為了在大量可用文本數據庫上取得最佳性能,正變得越來越龐大。然而,Transformer 結構的巨大規模使得在計算、環境或設備特定限制內部署模型變得困難。我們探索利用數據驅動的壓縮現有預訓練模型作為訓練較小模型的替代方法。為此,我們將目標損失地形的 Kronecker 分解曲率近似擴展到大型語言模型。通過這樣做,我們可以計算可以移除的結構的動態分配,以及考慮到這種移除的剩餘權重的更新。我們提供了一個通用框架,用於非結構化、半結構化和結構化剪枝,並改進了權重更新以捕捉更多權重之間的相關性,同時保持計算效率。在實驗中,我們的方法可以對一系列 OPT 模型和 Llamav2-7B 進行 20%-30% 的行和列剪枝,性能幾乎沒有損失,並在大型語言模型的非結構化和半結構化剪枝方面取得了最先進的結果。
English
State-of-the-art language models are becoming increasingly large in an effort
to achieve the highest performance on large corpora of available textual data.
However, the sheer size of the Transformer architectures makes it difficult to
deploy models within computational, environmental or device-specific
constraints. We explore data-driven compression of existing pretrained models
as an alternative to training smaller models from scratch. To do so, we scale
Kronecker-factored curvature approximations of the target loss landscape to
large language models. In doing so, we can compute both the dynamic allocation
of structures that can be removed as well as updates of remaining weights that
account for the removal. We provide a general framework for unstructured,
semi-structured and structured pruning and improve upon weight updates to
capture more correlations between weights, while remaining computationally
efficient. Experimentally, our method can prune rows and columns from a range
of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance,
and achieve state-of-the-art results in unstructured and semi-structured
pruning of large language models.