ChatPaper.aiChatPaper

SignRoundV2:弥合LLM极低位训练后量化中的性能差距

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

December 4, 2025
作者: Wenhua Cheng, Weiwei Zhang, Heng Guo, Haihao Shen
cs.AI

摘要

极低比特量化对于高效部署大语言模型至关重要,但这种方法在2比特甚至4比特(如MXFP4)条件下常导致性能严重下降。我们提出SignRoundV2这一训练后量化框架,即使不采用混合精度也极具成效。该框架创新性地结合了:(1)融合梯度信息与量化偏差的快速敏感度度量方法,用以指导层级比特分配;(2)轻量级量化参数预调优机制,显著提升极低比特量化效果。这些组件使SignRoundV2能够逼近全精度模型的性能。大量实验表明,本方法在4-5比特条件下可将精度损失控制在约1%的工业级水准,在2比特条件下仍保持强劲性能。相关实现已发布于https://github.com/intel/auto-round。
English
Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework that is highly effective even without mixed-precision. SignRoundV2 introduces (1) a fast sensitivity metric that combines gradient information with quantization-induced deviations to guide layer-wise bit allocation, and (2) a lightweight pre-tuning search for quantization scales to improve extremely low-bit quantization. These components allow SignRoundV2 to close the gap with full-precision models. Extensive experiments indicate that our method sustains competitive accuracy for LLMs, achieving production-grade performance with about 1 percent variance at 4-5 bits and strong results even at 2 bits. The implementation is available at https://github.com/intel/auto-round.
PDF71December 6, 2025