SINQ: キャリブレーション不要の低精度LLM重みのためのシンクホーン正規化量子化

要旨

ポストトレーニング量子化は、大規模言語モデルを低精度で展開するための最も広く使用されている戦略として登場した。しかし、現在の手法では、ビット幅が4以下の場合にパープレキシティの劣化が生じる。これは、外れ値を表現する際に、これらの外れ値と同じスケールを共有するパラメータで精度の問題が生じるためである。この問題は、特にキャリブレーションフリーの均一量子化手法において顕著である。本論文では、SINQを導入し、既存のポストトレーニング量子化器に追加の第二軸スケール係数と、行ごとおよび列ごとの分散を正規化するスケールを見つける高速なSinkhorn-Knoppスタイルのアルゴリズムを組み込むことで、量子化のための新しいマトリックスプロキシターゲットであるマトリックス不均衡を最小化する。本手法は層間の相互作用を持たず、新しいアーキテクチャに簡単に適用して任意の線形層を量子化することができる。Qwen3モデルファミリーとDeepSeek-V2.5に対して本手法を評価した結果、SINQはキャリブレーションなしの均一量子化ベースラインに対してWikiText2とC4のパープレキシティを大幅に改善し、キャリブレーションと非均一量子化レベルを組み合わせることでさらに向上させることができる。本研究成果を再現し、SINQを使用してモデルを簡単に量子化するためのコードはhttps://github.com/huawei-csl/SINQで公開されている。

English

Post-training quantization has emerged as the most widely used strategy for deploying large language models at low precision. Still, current methods show perplexity degradation at bit-widths less than or equal to 4, partly because representing outliers causes precision issues in parameters that share the same scales as these outliers. This problem is especially pronounced for calibration-free, uniform quantization methods. We introduce SINQ to augment existing post-training quantizers with an additional second-axis scale factor and a fast Sinkhorn-Knopp-style algorithm that finds scales to normalize per-row and per-column variances, thereby minimizing a novel per-matrix proxy target for quantization: the matrix imbalance. Our method has no interactions between layers and can be trivially applied to new architectures to quantize any linear layers. We evaluate our method on the Qwen3 model family and DeepSeek-V2.5. SINQ improves WikiText2 and C4 perplexity significantly against uncalibrated uniform quantization baselines and can be further enhanced by combining it with calibration and non-uniform quantization levels. Code to reproduce the results of this work and to easily quantize models using SINQ is available at https://github.com/huawei-csl/SINQ.

SINQ: キャリブレーション不要の低精度LLM重みのためのシンクホーン正規化量子化

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

要旨

Support