SINQ：基于Sinkhorn归一化的量化方法，实现无需校准的低精度大语言模型权重优化

摘要

后训练量化已成为在低精度下部署大型语言模型的最广泛使用策略。然而，当前方法在比特宽度小于或等于4时表现出困惑度下降，部分原因是表示异常值会导致与这些异常值共享相同尺度的参数出现精度问题。这一问题在校准无关的均匀量化方法中尤为突出。我们引入了SINQ，通过增加一个额外的第二轴尺度因子和一种快速的Sinkhorn-Knopp风格算法来增强现有后训练量化器，该算法找到尺度以归一化每行和每列的方差，从而最小化一种新颖的每矩阵量化代理目标：矩阵不平衡。我们的方法在层间无交互，并可轻松应用于新架构以量化任何线性层。我们在Qwen3模型家族和DeepSeek-V2.5上评估了我们的方法。SINQ显著改善了WikiText2和C4的困惑度，相较于未校准的均匀量化基线，并且可以通过结合校准和非均匀量化级别进一步增强。用于复现本工作结果及使用SINQ轻松量化模型的代码可在https://github.com/huawei-csl/SINQ获取。

English

Post-training quantization has emerged as the most widely used strategy for deploying large language models at low precision. Still, current methods show perplexity degradation at bit-widths less than or equal to 4, partly because representing outliers causes precision issues in parameters that share the same scales as these outliers. This problem is especially pronounced for calibration-free, uniform quantization methods. We introduce SINQ to augment existing post-training quantizers with an additional second-axis scale factor and a fast Sinkhorn-Knopp-style algorithm that finds scales to normalize per-row and per-column variances, thereby minimizing a novel per-matrix proxy target for quantization: the matrix imbalance. Our method has no interactions between layers and can be trivially applied to new architectures to quantize any linear layers. We evaluate our method on the Qwen3 model family and DeepSeek-V2.5. SINQ improves WikiText2 and C4 perplexity significantly against uncalibrated uniform quantization baselines and can be further enhanced by combining it with calibration and non-uniform quantization levels. Code to reproduce the results of this work and to easily quantize models using SINQ is available at https://github.com/huawei-csl/SINQ.

SINQ：基于Sinkhorn归一化的量化方法，实现无需校准的低精度大语言模型权重优化

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

摘要

Support