ShortGPT: 대규모 언어 모델의 계층은 예상보다 더 중복적이다

초록

대규모 언어 모델(LLM)의 성능이 계속해서 향상됨에 따라, 그 규모도 크게 증가하여 현재의 LLM은 수십억에서 수조 개의 파라미터를 포함하고 있습니다. 그러나 본 연구에서 우리는 LLM의 많은 계층이 높은 유사성을 보이며, 일부 계층은 네트워크 기능에 거의 영향을 미치지 않는다는 사실을 발견했습니다. 이러한 관찰을 바탕으로, 우리는 각 계층의 중요성을 측정하기 위해 블록 영향력(Block Influence, BI)이라는 지표를 정의했습니다. 그리고 이를 기반으로 LLM에서 중복된 계층을 직접 삭제하는 간단한 가지치기 접근법인 계층 제거를 제안합니다. 실험 결과, 우리가 ShortGPT라고 명명한 이 방법은 기존의 최신(SOTA) 가지치기 방법들을 크게 능가하는 성능을 보였습니다. 또한 ShortGPT는 양자화(quantization)와 같은 방법과 직교적(orthogonal)이어서 파라미터와 계산량을 더욱 줄일 수 있습니다. 복잡한 가지치기 기법 대신 단순한 계층 제거를 통해 더 나은 결과를 얻을 수 있다는 점은 모델 아키텍처에 높은 중복성이 존재함을 시사합니다.

English

As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence (BI) to gauge the significance of each layer in LLMs. We then propose a straightforward pruning approach: layer removal, in which we directly delete the redundant layers in LLMs based on their BI scores. Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation. The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture.

ShortGPT: 대규모 언어 모델의 계층은 예상보다 더 중복적이다

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

초록

Support