Q-Sched：通过量化感知调度拓展少步扩散模型的边界

摘要

文本到图像的扩散模型在计算上极为密集，通常需要多次通过庞大的Transformer骨干网络进行前向传播。例如，Stable Diffusion XL通过评估一个拥有26亿参数的模型50次来生成高质量图像，这一过程即使对于单个批次而言也代价高昂。少步扩散模型将这一成本降低至2到8次去噪步骤，但仍依赖于未压缩的大型U-Net或扩散Transformer骨干网络，这些网络在没有数据中心级GPU的情况下进行全精度推理往往过于昂贵。这些要求也限制了依赖全精度校准的现有训练后量化方法。我们引入了Q-Sched，这是一种新的训练后量化范式，它调整扩散模型的调度器而非模型权重。通过调整少步采样轨迹，Q-Sched在模型尺寸减少4倍的同时保持了全精度准确性。为了学习量化感知的预处理系数，我们提出了JAQ损失，该损失结合了文本-图像兼容性与图像质量指标，用于细粒度优化。JAQ无需参考图像，仅需少量校准提示，避免了校准期间的全精度推理。Q-Sched带来了显著提升：相较于FP16 4步潜在一致性模型，FID提升了15.5%；相较于FP16 8步阶段一致性模型，提升了16.6%，表明量化与少步蒸馏在高保真生成方面具有互补性。一项包含超过80,000条注释的大规模用户研究进一步证实了Q-Sched在FLUX.1[schnell]和SDXL-Turbo上的有效性。

English

Text-to-image diffusion models are computationally intensive, often requiring dozens of forward passes through large transformer backbones. For instance, Stable Diffusion XL generates high-quality images with 50 evaluations of a 2.6B-parameter model, an expensive process even for a single batch. Few-step diffusion models reduce this cost to 2-8 denoising steps but still depend on large, uncompressed U-Net or diffusion transformer backbones, which are often too costly for full-precision inference without datacenter GPUs. These requirements also limit existing post-training quantization methods that rely on full-precision calibration. We introduce Q-Sched, a new paradigm for post-training quantization that modifies the diffusion model scheduler rather than model weights. By adjusting the few-step sampling trajectory, Q-Sched achieves full-precision accuracy with a 4x reduction in model size. To learn quantization-aware pre-conditioning coefficients, we propose the JAQ loss, which combines text-image compatibility with an image quality metric for fine-grained optimization. JAQ is reference-free and requires only a handful of calibration prompts, avoiding full-precision inference during calibration. Q-Sched delivers substantial gains: a 15.5% FID improvement over the FP16 4-step Latent Consistency Model and a 16.6% improvement over the FP16 8-step Phased Consistency Model, showing that quantization and few-step distillation are complementary for high-fidelity generation. A large-scale user study with more than 80,000 annotations further confirms Q-Sched's effectiveness on both FLUX.1[schnell] and SDXL-Turbo.

Q-Sched：通过量化感知调度拓展少步扩散模型的边界

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

摘要

Support