ChatPaper.aiChatPaper

四重奏 II:透過改進的無偏梯度估計在NVFP4中實現精確大型語言模型預訓練

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

January 30, 2026
作者: Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh
cs.AI

摘要

NVIDIA Blackwell GPU硬體支援的NVFP4低精度格式,首次有望實現大規模模型(如LLM)端到端全量化預訓練。然而,現有量化訓練方法仍會犧牲此格式的部分表示能力,以換取隨機取整(SR)技術所實現的更精確無偏量化梯度估計,導致其精度相較標準FP16與FP8訓練仍有明顯差距。本文通過名為MS-EDEN的新型微尺度格式無偏差量化方法,將量化誤差降至SR的兩倍以下,從而提升NVFP4量化訓練的現有技術水準。我們將其整合至名為Quartet II的新型全NVFP4線性層量化方案中,透過理論分析證明Quartet II在正向與反向傳播的所有主要矩陣乘法運算中,均能實現更穩定的梯度估計優化。此外,本方案與近期針對NVFP4的訓練優化技術形成良好協同效應。我們進一步在1.9B參數規模、38B詞元的端到端LLM訓練中驗證Quartet II,並提供適用於NVIDIA Blackwell GPU的運算核心,相較BF16實現最高4.2倍加速。程式碼已開源於:https://github.com/IST-DASLab/Quartet-II。
English
The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation by stochastic rounding (SR), losing noticeable accuracy relative to standard FP16 and FP8 training. In this paper, improve the state of the art for quantized training in NVFP4 via a novel unbiased quantization routine for micro-scaled formats, called MS-EDEN, that has more than 2x lower quantization error than SR. We integrate it into a novel fully-NVFP4 quantization scheme for linear layers, called Quartet II. We show analytically that Quartet II achieves consistently better gradient estimation across all major matrix multiplications, both on the forward and on the backward passes. In addition, our proposal synergizes well with recent training improvements aimed specifically at NVFP4. We further validate Quartet II on end-to-end LLM training with up to 1.9B parameters on 38B tokens. We provide kernels for execution on NVIDIA Blackwell GPUs with up to 4.2x speedup over BF16. Our code is available at https://github.com/IST-DASLab/Quartet-II .
PDF464February 3, 2026