COMPOT: Kalibratie-geoptimaliseerde Matrix Procrustes Orthogonalisatie voor Compressie van Transformers

Samenvatting

Post-training compressie van Transformer-modellen berust doorgaans op truncated singular value decomposition (SVD). Het afdwingen van een enkele gedeelde deelruimte kan echter de nauwkeurigheid aantasten, zelfs bij matige compressie. Sparse dictionary learning biedt een flexibelere union-of-subspaces representatie, maar bestaande methoden kampen vaak met iteratieve updates van de dictionary en coëfficiënten. Wij stellen COMPOT voor (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), een trainingsvrij compressieraamwerk dat een kleine kalibratiedataset gebruikt om een sparse gewichtsfactorisatie te schatten. COMPOT gebruikt orthogonale dictionaries die gesloten Procrustes-updates voor de dictionary en analytische sparse coding in één stap voor de coëfficiënten mogelijk maken, waardoor iteratieve optimalisatie wordt geëlimineerd. Om om te gaan met heterogene laaggevoeligheid onder een globaal compressiebudget, introduceert COMPOT verder een eenmalige dynamische allocatiestrategie die laagsgewijze compressiepercentages adaptief herverdeelt. Uitgebreide experimenten met diverse architecturen en taken tonen aan dat COMPOT consequent een superieure kwaliteit-compressie-afweging biedt ten opzichte van sterke low-rank en sparse baseline-methoden, terwijl het volledig compatibel blijft met post-training kwantisatie voor extreme compressie. Code is beschikbaar op https://github.com/mts-ai/COMPOT.

English

Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available https://github.com/mts-ai/COMPOT{here}.

COMPOT: Kalibratie-geoptimaliseerde Matrix Procrustes Orthogonalisatie voor Compressie van Transformers

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

Samenvatting

Support