ShortGPT：大型語言模型中的層比您預期的更冗余

摘要

隨著大型語言模型（LLMs）在性能上不斷進步，其規模已顯著擴大，目前的LLMs包含數十億甚至數萬億個參數。然而，在這項研究中，我們發現許多LLMs的層之間存在高度相似性，並且一些層在網絡功能中起到微不足道的作用。基於這一觀察，我們定義了一個稱為區塊影響（BI）的指標，以評估LLMs中每個層的重要性。然後，我們提出了一種直接修剪方法：層刪除，通過根據其BI分數直接刪除LLMs中的冗餘層。實驗表明，我們稱之為ShortGPT的方法在模型修剪方面顯著優於先前的最新方法。此外，ShortGPT與量化等方法正交，可以進一步減少參數和計算量。通過簡單的層刪除來取得更好的結果，而不是更複雜的修剪技術，這表明模型架構中存在高度冗餘。

English

As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence (BI) to gauge the significance of each layer in LLMs. We then propose a straightforward pruning approach: layer removal, in which we directly delete the redundant layers in LLMs based on their BI scores. Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation. The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture.

ShortGPT：大型語言模型中的層比您預期的更冗余

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

摘要

Support