SigmaScale：基於SVD低秩分解與學習縮放矩陣的大語言模型壓縮

摘要

我們提出 SigmaScale，這是一種學習輔助縮放矩陣 S 的方法，旨在協助基於截斷奇異值分解（SVD）的大型語言模型（LLM）壓縮。不同於以解析方式推導縮放矩陣，SigmaScale 在基於啟用值感知的壓縮損失下，最佳化兩組定義對角線行與列縮放轉換的向量。我們證明，學習到的縮放能降低權重矩陣的有效內在秩，這反映在有效秩熵的減少上，而此減少與壓縮損失之間存在強烈相關性。在 Llama 3.1 8B Instruct 與 Qwen3-8B 上的實驗顯示，SigmaScale 在困惑度與零樣本基準測試上，與緊密相關的現有基於 SVD 的壓縮方法相比，具有競爭力。透過使用學習到的啟用值感知轉換，SigmaScale 藉由適應個別模型權重的結構，探索出一條更具彈性的低秩 LLM 壓縮途徑。在特定任務中觀察到的優勢，使我們的方法成為需要降低 LLM 推論計算成本的應用中一個可行的選項。

English

We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.