Q-Sched:通过量化感知调度拓展少步扩散模型的边界
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling
September 1, 2025
作者: Natalia Frumkin, Diana Marculescu
cs.AI
摘要
文本到图像的扩散模型在计算上极为密集,通常需要多次通过庞大的Transformer骨干网络进行前向传播。例如,Stable Diffusion XL通过评估一个拥有26亿参数的模型50次来生成高质量图像,这一过程即使对于单个批次而言也代价高昂。少步扩散模型将这一成本降低至2到8次去噪步骤,但仍依赖于未压缩的大型U-Net或扩散Transformer骨干网络,这些网络在没有数据中心级GPU的情况下进行全精度推理往往过于昂贵。这些要求也限制了依赖全精度校准的现有训练后量化方法。我们引入了Q-Sched,这是一种新的训练后量化范式,它调整扩散模型的调度器而非模型权重。通过调整少步采样轨迹,Q-Sched在模型尺寸减少4倍的同时保持了全精度准确性。为了学习量化感知的预处理系数,我们提出了JAQ损失,该损失结合了文本-图像兼容性与图像质量指标,用于细粒度优化。JAQ无需参考图像,仅需少量校准提示,避免了校准期间的全精度推理。Q-Sched带来了显著提升:相较于FP16 4步潜在一致性模型,FID提升了15.5%;相较于FP16 8步阶段一致性模型,提升了16.6%,表明量化与少步蒸馏在高保真生成方面具有互补性。一项包含超过80,000条注释的大规模用户研究进一步证实了Q-Sched在FLUX.1[schnell]和SDXL-Turbo上的有效性。
English
Text-to-image diffusion models are computationally intensive, often requiring
dozens of forward passes through large transformer backbones. For instance,
Stable Diffusion XL generates high-quality images with 50 evaluations of a
2.6B-parameter model, an expensive process even for a single batch. Few-step
diffusion models reduce this cost to 2-8 denoising steps but still depend on
large, uncompressed U-Net or diffusion transformer backbones, which are often
too costly for full-precision inference without datacenter GPUs. These
requirements also limit existing post-training quantization methods that rely
on full-precision calibration. We introduce Q-Sched, a new paradigm for
post-training quantization that modifies the diffusion model scheduler rather
than model weights. By adjusting the few-step sampling trajectory, Q-Sched
achieves full-precision accuracy with a 4x reduction in model size. To learn
quantization-aware pre-conditioning coefficients, we propose the JAQ loss,
which combines text-image compatibility with an image quality metric for
fine-grained optimization. JAQ is reference-free and requires only a handful of
calibration prompts, avoiding full-precision inference during calibration.
Q-Sched delivers substantial gains: a 15.5% FID improvement over the FP16
4-step Latent Consistency Model and a 16.6% improvement over the FP16 8-step
Phased Consistency Model, showing that quantization and few-step distillation
are complementary for high-fidelity generation. A large-scale user study with
more than 80,000 annotations further confirms Q-Sched's effectiveness on both
FLUX.1[schnell] and SDXL-Turbo.