短GPT:大型语言模型中的层次比您预期的更多余
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
March 6, 2024
作者: Xin Men, Mingyu Xu, Qingyu Zhang, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, Weipeng Chen
cs.AI
摘要
随着大型语言模型(LLMs)在性能上不断取得进展,其规模显著扩大,当前的LLMs包含数十亿甚至数万亿个参数。然而,在本研究中,我们发现许多LLMs的层之间存在高度相似性,而一些层在网络功能中起到微不足道的作用。基于这一观察,我们定义了一个称为“块影响”(BI)的度量标准,用于衡量LLMs中每个层的重要性。然后,我们提出了一种简单的修剪方法:层删除,即根据它们的BI分数直接删除LLMs中的冗余层。实验证明,我们称之为ShortGPT的方法在模型修剪方面明显优于先前的最先进方法。此外,ShortGPT与量化等方法正交,可以进一步减少参数和计算量。通过简单的层删除而非更复杂的修剪技术实现更好结果的能力,表明模型架构中存在高度冗余。
English
As Large Language Models (LLMs) continue to advance in performance, their
size has escalated significantly, with current LLMs containing billions or even
trillions of parameters. However, in this study, we discovered that many layers
of LLMs exhibit high similarity, and some layers play a negligible role in
network functionality. Based on this observation, we define a metric called
Block Influence (BI) to gauge the significance of each layer in LLMs. We then
propose a straightforward pruning approach: layer removal, in which we directly
delete the redundant layers in LLMs based on their BI scores. Experiments
demonstrate that our method, which we call ShortGPT, significantly outperforms
previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT
is orthogonal to quantization-like methods, enabling further reduction in
parameters and computation. The ability to achieve better results through
simple layer removal, as opposed to more complex pruning techniques, suggests a
high degree of redundancy in the model architecture.