LLM 서전

초록

최신 언어 모델은 대규모 텍스트 데이터 코퍼스에서 최고의 성능을 달성하기 위해 점점 더 커지고 있다. 그러나 Transformer 아키텍처의 방대한 크기로 인해 계산적, 환경적 또는 특정 디바이스의 제약 내에서 모델을 배포하기가 어려워지고 있다. 우리는 처음부터 더 작은 모델을 훈련시키는 대신 기존에 사전 훈련된 모델을 데이터 기반으로 압축하는 방법을 탐구한다. 이를 위해 대상 손실 경관의 Kronecker-factored 곡률 근사를 대규모 언어 모델에 확장한다. 이를 통해 제거 가능한 구조의 동적 할당과 제거를 고려한 남은 가중치의 업데이트를 모두 계산할 수 있다. 우리는 비정형, 준정형 및 정형 가지치기를 위한 일반적인 프레임워크를 제공하고, 계산적으로 효율적으로 유지하면서 가중치 간의 더 많은 상관관계를 포착하기 위해 가중치 업데이트를 개선한다. 실험적으로, 우리의 방법은 다양한 OPT 모델과 Llamav2-7B의 행과 열을 20%-30%까지 가지치기할 수 있으며, 성능 저하가 거의 없이 대규모 언어 모델의 비정형 및 준정형 가지치기에서 최신의 결과를 달성한다.

English

State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models within computational, environmental or device-specific constraints. We explore data-driven compression of existing pretrained models as an alternative to training smaller models from scratch. To do so, we scale Kronecker-factored curvature approximations of the target loss landscape to large language models. In doing so, we can compute both the dynamic allocation of structures that can be removed as well as updates of remaining weights that account for the removal. We provide a general framework for unstructured, semi-structured and structured pruning and improve upon weight updates to capture more correlations between weights, while remaining computationally efficient. Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

LLM 서전

The LLM Surgeon

초록

Support