Swift-SVD：低秩大语言模型压缩的理论最优性与实践效率融合

摘要

大型语言模型的部署受到静态权重和动态键值缓存对内存与带宽需求的制约。基于奇异值分解的压缩技术为降低这些成本提供了硬件友好的解决方案。然而，现有方法存在两个关键局限：部分方法在重构误差方面未达最优，另一些虽理论最优但实际效率低下。本文提出Swift-SVD——一种激活感知的闭式压缩框架，能同时保证理论最优性、实践高效性及数值稳定性。该框架通过批量输入逐步聚合输出激活的协方差，并在聚合后执行单次特征值分解，从而实现无需训练、快速且最优的逐层低秩近似。我们采用有效秩分析局部层级压缩性，并设计动态秩分配策略，协同考虑局部重构损失与端到端层级重要性。在六个大语言模型和八个数据集上的广泛实验表明，Swift-SVD优于现有最优基线，在实现最佳压缩精度的同时，将端到端压缩速度提升3-70倍。代码将在论文录用后开源。

English

The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code will be released upon acceptance.