COMPOT：面向Transformer压缩的校准优化矩阵正交化方法

摘要

Transformer模型的后训练压缩通常依赖于截断奇异值分解(SVD)，但强制共享单一子空间即使在中等压缩率下也会导致精度下降。稀疏字典学习提供了更灵活的联合子空间表示，但现有方法常受限于字典与系数的迭代更新。我们提出COMPOT（面向Transformer的校准优化矩阵Procrustes正交化方法），这是一种无需训练的压缩框架，利用小型校准数据集估计稀疏权重分解。COMPOT采用正交字典，可实现字典的闭式Procrustes更新和系数的解析单步稀疏编码，从而消除迭代优化。针对全局压缩预算下的异构层敏感性问题，COMPOT进一步引入一次性动态分配策略，自适应地重新分配逐层压缩率。跨多种架构和任务的广泛实验表明，COMPOT在强低秩与稀疏基线方法上始终提供更优的质量-压缩权衡，同时与后训练量化技术完全兼容以实现极致压缩。代码已开源https://github.com/mts-ai/COMPOT。

English

Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available https://github.com/mts-ai/COMPOT{here}.

COMPOT：面向Transformer压缩的校准优化矩阵正交化方法

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

摘要

Support