압축 재고: 대규모 언어 모델의 잠재 특성에 대한 차원 축소 모델링

초록

대규모 언어 모델(LLMs)의 방대한 규모로 인해 기존의 압축 방법론을 직접 적용하는 것은 실용적이지 않습니다. 최소한의 그래디언트 업데이트에도 요구되는 계산 자원은 특히 소비자용 하드웨어에서 큰 도전 과제로 작용합니다. 본 논문은 축소 차원 모델링(reduced order modelling)을 기반으로 LLMs의 파라미터적이고 실용적인 압축을 위한 혁신적인 접근 방식을 소개합니다. 이 방법은 특징 공간에서의 저차원 분해와 가중치 공간에서의 재파라미터화를 포함합니다. 특히, 이 압축 기술은 계층별로 작동하며 GPU 장치가 필요하지 않아, 엄격한 메모리와 시간 제약 하에서도 수십억 규모의 모델을 압축할 수 있습니다. 우리의 방법은 행렬 분해를 활용하여 모델 압축 분야에서 중요한 진전을 이루었으며, 현재 최신 구조적 가지치기(structured pruning) 방법보다 우수한 효율성을 입증했습니다.

English

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

압축 재고: 대규모 언어 모델의 잠재 특성에 대한 차원 축소 모델링

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

초록

Support