重新思考壓縮：在大型語言模型中對潛在特徵進行降階建模

摘要

由於大型語言模型（LLMs）的規模龐大，直接應用傳統的壓縮方法變得不切實際。即使是最小的梯度更新也需要大量計算，尤其是在消費級硬件上。本文介紹了一種基於降階建模的創新方法，用於對LLMs進行參數化和實用壓縮，其中包括在特徵空間內進行低秩分解並在權重空間內重新參數化。值得注意的是，這種壓縮技術以分層方式運作，無需GPU設備，並能夠在嚴格的記憶體和時間限制下對十億級模型進行壓縮。我們的方法通過利用矩陣分解在模型壓縮方面取得了重大進展，與目前主流的結構化修剪方法相比，表現出更優越的效能。

English

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

重新思考壓縮：在大型語言模型中對潛在特徵進行降階建模

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

摘要

Support