量化进化策略：以低精度成本实现量化大语言模型的高精度微调

摘要

后训练量化（PTQ）对于在内存受限设备上部署大语言模型（LLM）至关重要，但该方法会使模型固化且难以微调。包括强化学习（RL）在内的标准微调范式本质上依赖于反向传播和高精度权重来计算梯度，因此无法应用于参数空间离散且不可微的量化模型。虽然进化策略（ES）提供了无需反向传播的替代方案，但量化参数的优化仍可能因梯度消失或失准而失败。本文提出量化进化策略（QES），这是一种直接在量化空间执行全参数微调的优化范式。QES基于两项创新：（1）集成累积误差反馈以保持高精度梯度信号；（2）采用无状态种子重放技术将内存占用降至低精度推理水平。在算术推理任务中，QES显著优于当前最先进的无梯度微调方法，使得直接对量化模型进行微调成为可能。这为完全在量化空间中扩展LLM开辟了新路径。源代码详见：https://github.com/dibbla/Quantized-Evolution-Strategies。

English

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

量化进化策略：以低精度成本实现量化大语言模型的高精度微调

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

摘要

Support