重新思考压缩：对大型语言模型中潜在特征的降阶建模

摘要

由于大型语言模型（LLMs）的规模庞大，直接应用传统的压缩方法变得不切实际。即使是最小的梯度更新也会带来计算需求，尤其是在消费级硬件上面临挑战。本文介绍了一种基于降阶建模的参数化和实用LLMs压缩的创新方法，其中包括在特征空间内进行低秩分解并在权重空间内重新参数化。值得注意的是，这种压缩技术以逐层方式运作，无需GPU设备，并能够在严格的内存和时间限制下压缩十亿规模的模型。我们的方法通过利用矩阵分解在模型压缩方面取得了重大进展，与当前主流的结构化剪枝方法相比展现出更优越的效果。

English

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

重新思考压缩：对大型语言模型中潜在特征的降阶建模

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

摘要

Support