COSPADI:基於校準引導的稀疏字典學習壓縮大型語言模型
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
September 26, 2025
作者: Dmitriy Shopkhoev, Denis Makhov, Magauiya Zhussip, Ammar Ali, Stamatios Lefkimmiatis
cs.AI
摘要
大型語言模型(LLMs)的訓練後壓縮主要依賴於低秩權重近似,該方法將權重矩陣的每一列表示在一個共享的低維子空間中。雖然這是一種計算效率高的策略,但所施加的結構約束較為僵化,可能導致模型精度顯著下降。在本研究中,我們提出了CoSpaDi(基於稀疏字典學習的壓縮),這是一種新穎的無需訓練的壓縮框架,它用更靈活的結構化稀疏分解取代了低秩分解,其中每個權重矩陣由一個密集字典和一個列稀疏係數矩陣表示。這種形式實現了子空間聯合表示:原始權重矩陣的不同列在由自適應選擇的字典原子所張成的不同子空間中進行近似,提供了比單一不變基更大的表達能力。關鍵在於,CoSpaDi利用一個小型校準數據集來優化分解,使得壓縮投影層的輸出激活與原始層的輸出激活緊密匹配,從而最小化功能重建誤差而非僅僅是權重近似。這種數據感知策略在合理的壓縮比下無需任何微調即可更好地保持模型保真度。此外,所產生的結構化稀疏性允許高效的稀疏-密集矩陣乘法,並且與訓練後量化兼容,以進一步獲得內存和延遲的增益。我們在多個Llama和Qwen模型上評估了CoSpaDi,在20-50%的壓縮比下進行了逐層和逐組設置,結果顯示其在準確性和困惑度方面均優於最先進的數據感知低秩方法。我們的結果確立了結構化稀疏字典學習作為傳統低秩方法在高效LLM部署中的強大替代方案。
English
Post-training compression of large language models (LLMs) largely relies on
low-rank weight approximation, which represents each column of a weight matrix
in a shared low-dimensional subspace. While this is a computationally efficient
strategy, the imposed structural constraint is rigid and can lead to a
noticeable model accuracy drop. In this work, we propose CoSpaDi (Compression
via Sparse Dictionary Learning), a novel training-free compression framework
that replaces low-rank decomposition with a more flexible structured sparse
factorization in which each weight matrix is represented with a dense
dictionary and a column-sparse coefficient matrix. This formulation enables a
union-of-subspaces representation: different columns of the original weight
matrix are approximated in distinct subspaces spanned by adaptively selected
dictionary atoms, offering greater expressiveness than a single invariant
basis. Crucially, CoSpaDi leverages a small calibration dataset to optimize the
factorization such that the output activations of compressed projection layers
closely match those of the original ones, thereby minimizing functional
reconstruction error rather than mere weight approximation. This data-aware
strategy preserves better model fidelity without any fine-tuning under
reasonable compression ratios. Moreover, the resulting structured sparsity
allows efficient sparse-dense matrix multiplication and is compatible with
post-training quantization for further memory and latency gains. We evaluate
CoSpaDi across multiple Llama and Qwen models under per-layer and per-group
settings at 20-50\% compression ratios, demonstrating consistent superiority
over state-of-the-art data-aware low-rank methods both in accuracy and
perplexity. Our results establish structured sparse dictionary learning as a
powerful alternative to conventional low-rank approaches for efficient LLM
deployment.