ChatPaper.aiChatPaper

ECO:无需全精度主权重的量化训练

ECO: Quantized Training without Full-Precision Master Weights

January 29, 2026
作者: Mahdi Nikdan, Amir Zandieh, Dan Alistarh, Vahab Mirrokni
cs.AI

摘要

量化技术已显著提升大型语言模型(LLM)训练的计算与内存效率。然而现有方法仍需依赖高精度累积更新:具体而言,梯度更新必须应用于高精度权重缓冲区(即主权重)。该缓冲区会带来显著的内存开销,尤其在稀疏专家混合模型(SMoE)中,模型参数和优化器状态占据了内存使用的主导地位。为此,我们提出误差补偿优化器(ECO),通过直接将更新应用于量化参数来消除主权重。ECO在每步训练后对权重进行量化,并将产生的量化误差精准注入优化器动量中,形成无需额外内存的误差反馈循环。我们证明,在标准假设和衰减学习率条件下,ECO能收敛至最优解邻域,而简单移除主权重可能产生与学习率成反比的误差。我们通过FP8量化预训练小型Transformer模型(30-800M)、Gemma-3 1B模型及21亿参数稀疏MoE模型,以及INT4精度微调DeepSeek-MoE-16B的实验表明:ECO在保持近乎无损精度的前提下,始终匹配使用主权重的基线方法,显著改善了静态内存与验证损失的帕累托边界。
English
Quantization has significantly improved the compute and memory efficiency of Large Language Model (LLM) training. However, existing approaches still rely on accumulating their updates in high-precision: concretely, gradient updates must be applied to a high-precision weight buffer, known as master weights. This buffer introduces substantial memory overhead, particularly for Sparse Mixture of Experts (SMoE) models, where model parameters and optimizer states dominate memory usage. To address this, we introduce the Error-Compensating Optimizer (ECO), which eliminates master weights by applying updates directly to quantized parameters. ECO quantizes weights after each step and carefully injects the resulting quantization error into the optimizer momentum, forming an error-feedback loop with no additional memory. We prove that, under standard assumptions and a decaying learning rate, ECO converges to a constant-radius neighborhood of the optimum, while naive master-weight removal can incur an error that is inversely proportional to the learning rate. We show empirical results for pretraining small Transformers (30-800M), a Gemma-3 1B model, and a 2.1B parameter Sparse MoE model with FP8 quantization, and fine-tuning DeepSeek-MoE-16B in INT4 precision. Throughout, ECO matches baselines with master weights up to near-lossless accuracy, significantly shifting the static memory vs validation loss Pareto frontier.
PDF33January 31, 2026