LLM 外科醫師

摘要

最先進的語言模型為了在大量可用文本數據庫上取得最佳性能，正變得越來越龐大。然而，Transformer 結構的巨大規模使得在計算、環境或設備特定限制內部署模型變得困難。我們探索利用數據驅動的壓縮現有預訓練模型作為訓練較小模型的替代方法。為此，我們將目標損失地形的 Kronecker 分解曲率近似擴展到大型語言模型。通過這樣做，我們可以計算可以移除的結構的動態分配，以及考慮到這種移除的剩餘權重的更新。我們提供了一個通用框架，用於非結構化、半結構化和結構化剪枝，並改進了權重更新以捕捉更多權重之間的相關性，同時保持計算效率。在實驗中，我們的方法可以對一系列 OPT 模型和 Llamav2-7B 進行 20%-30% 的行和列剪枝，性能幾乎沒有損失，並在大型語言模型的非結構化和半結構化剪枝方面取得了最先進的結果。

English

State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models within computational, environmental or device-specific constraints. We explore data-driven compression of existing pretrained models as an alternative to training smaller models from scratch. To do so, we scale Kronecker-factored curvature approximations of the target loss landscape to large language models. In doing so, we can compute both the dynamic allocation of structures that can be removed as well as updates of remaining weights that account for the removal. We provide a general framework for unstructured, semi-structured and structured pruning and improve upon weight updates to capture more correlations between weights, while remaining computationally efficient. Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

LLM 外科醫師

The LLM Surgeon

摘要

Support