ChatPaper.aiChatPaper

SigmaScale:基于SVD低秩分解与学习缩放矩阵的LLM压缩

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

June 5, 2026
作者: Ernests Lavrinovics, Marco Letizia, Roy Janco, Shai Segal, Johannes Bjerva, Maurizio Pierini
cs.AI

摘要

我们提出了SigmaScale,一种学习辅助缩放矩阵S的方法,以辅助基于截断奇异值分解(SVD)的大语言模型(LLM)压缩。与解析推导缩放矩阵不同,SigmaScale在激活感知压缩损失下优化两组向量,这两组向量定义了对角行和列缩放变换。我们表明,学习到的缩放降低了权重矩阵的有效本征秩,这体现在有效秩熵的减少,且这种减少与压缩损失高度相关。在Llama 3.1 8B Instruct和Qwen3-8B上的实验显示,SigmaScale在困惑度和零样本基准测试上与紧密相关的最先进SVD压缩方法具有竞争力。通过使用学习到的激活感知变换,SigmaScale探索了一条更灵活的低秩LLM压缩路径,能够适应单个模型权重的结构。特定任务中观察到的优势使我们的方法成为需要降低LLM推理计算成本的应用的可行选择。
English
We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.