量化进化策略:以低精度成本实现量化大语言模型的高精度微调
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
February 3, 2026
作者: Yinggan Xu, Risto Miikkulainen, Xin Qiu
cs.AI
摘要
训练后量化技术对于在内存受限设备上部署大语言模型至关重要,但该方法会使模型变为静态且难以微调。包括强化学习在内的标准微调范式本质上依赖于反向传播和高精度权重来计算梯度,因此无法应用于参数空间离散且不可微的量化模型。虽然进化策略提供了无需反向传播的替代方案,但由于梯度消失或失准问题,量化参数的优化仍可能失败。本文提出量化进化策略这一直接在量化空间执行全参数微调的优化范式,其创新点在于:(1)集成累积误差反馈以保留高精度梯度信号;(2)采用无状态种子回放技术将内存占用降至低精度推理水平。在数学推理任务中,QES显著优于当前最先进的无梯度微调方法,使得直接对量化模型进行微调成为可能,从而为完全在量化空间扩展大语言模型开辟了新路径。源代码详见:https://github.com/dibbla/Quantized-Evolution-Strategies。
English
Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .