ChatPaper.aiChatPaper

量化感知訓練的規模法則

Scaling Law for Quantization-Aware Training

May 20, 2025
作者: Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng, Zeyue Xue, Zhiheng Liu, Yunshui Li, Jin Ma, Jie Huang, Xun Zhou, Ping Luo
cs.AI

摘要

大型語言模型(LLMs)需要大量的計算和記憶體資源,這給部署帶來了挑戰。量化感知訓練(QAT)通過降低模型精度同時保持性能來應對這些挑戰。然而,QAT的擴展行為,特別是在4位精度(W4A4)下的表現,尚未得到充分理解。現有的QAT擴展定律往往忽略了訓練數據量和量化粒度等關鍵因素,這限制了其適用性。本文提出了一個統一的QAT擴展定律,將量化誤差建模為模型大小、訓練數據量和量化組大小的函數。通過268次QAT實驗,我們發現量化誤差隨著模型大小的增加而減少,但隨著訓練數據量的增加和量化粒度的變粗而上升。為了識別W4A4量化誤差的來源,我們將其分解為權重量化和激活量化兩個部分。這兩部分都遵循W4A4量化誤差的總體趨勢,但具有不同的敏感性。具體而言,權重量化誤差隨著訓練數據量的增加而更快地上升。進一步分析表明,由異常值引起的FC2層中的激活量化誤差是W4A4 QAT量化誤差的主要瓶頸。通過應用混合精度量化來解決這一瓶頸,我們證明權重量化和激活量化誤差可以收斂到相似的水平。此外,隨著訓練數據的增加,權重量化誤差最終會超過激活量化誤差,這表明在這種情況下減少權重量化誤差也很重要。這些發現為改進QAT的研究和開發提供了關鍵見解。
English
Large language models (LLMs) demand substantial computational and memory resources, creating deployment challenges. Quantization-aware training (QAT) addresses these challenges by reducing model precision while maintaining performance. However, the scaling behavior of QAT, especially at 4-bit precision (W4A4), is not well understood. Existing QAT scaling laws often ignore key factors such as the number of training tokens and quantization granularity, which limits their applicability. This paper proposes a unified scaling law for QAT that models quantization error as a function of model size, training data volume, and quantization group size. Through 268 QAT experiments, we show that quantization error decreases as model size increases, but rises with more training tokens and coarser quantization granularity. To identify the sources of W4A4 quantization error, we decompose it into weight and activation components. Both components follow the overall trend of W4A4 quantization error, but with different sensitivities. Specifically, weight quantization error increases more rapidly with more training tokens. Further analysis shows that the activation quantization error in the FC2 layer, caused by outliers, is the primary bottleneck of W4A4 QAT quantization error. By applying mixed-precision quantization to address this bottleneck, we demonstrate that weight and activation quantization errors can converge to similar levels. Additionally, with more training data, weight quantization error eventually exceeds activation quantization error, suggesting that reducing weight quantization error is also important in such scenarios. These findings offer key insights for improving QAT research and development.

Summary

AI-Generated Summary

PDF592May 22, 2025