COMPOT：キャリブレーション最適化マトリックス・プロクラステス直交化によるトランスフォーマー圧縮

要旨

Transformerモデルの事後学習圧縮では、一般に特異値分解（SVD）の切り捨てが利用されます。しかし、単一の共有部分空間を強制することは、中程度の圧縮率であっても精度劣化を招く場合があります。スパース辞書学習は部分空間の和集合によるより柔軟な表現を提供しますが、既存手法では反復的な辞書と係数の更新が課題となります。本論文ではCOMPOT（Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers）を提案します。これは少量のキャリブレーションデータセットを用いてスパースな重み分解を推定する、学習不要の圧縮フレームワークです。COMPOTは直交辞書を採用することで、辞書に対するProcrustes更新を閉形式で実現し、係数に対する解析的な単一ステップのスパースコーディングを可能にし、反復最適化を不要とします。さらに、大域的圧縮バジェット下での層ごとの感度の不均一性に対処するため、層単位の圧縮率を適応的に再配分するワンショット動的割り当て戦略を導入します。多様なアーキテクチャとタスクにおける大規模な実験により、COMPOTが強力な低ランク・スパースベースラインを一貫して上回る品質と圧縮のトレードオフを実現し、極限圧縮のための事後学習量子化と完全に互換性を保つことが示されました。コードはhttps://github.com/mts-ai/COMPOTで公開されています。

English

Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available https://github.com/mts-ai/COMPOT{here}.

COMPOT：キャリブレーション最適化マトリックス・プロクラステス直交化によるトランスフォーマー圧縮

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

要旨

Support