Q-Sched：透過量化感知排程技術拓展少步數擴散模型的邊界

摘要

文本到圖像的擴散模型在計算上非常密集，通常需要通過大型Transformer骨幹進行數十次前向傳遞。例如，Stable Diffusion XL通過對一個26億參數的模型進行50次評估來生成高質量圖像，即使對於單一批次來說，這也是一個昂貴的過程。少步擴散模型將這一成本降低到2-8次去噪步驟，但仍然依賴於大型未壓縮的U-Net或擴散Transformer骨幹，這些模型在沒有數據中心GPU的情況下進行全精度推理通常成本過高。這些要求也限制了現有的依賴於全精度校準的訓練後量化方法。我們引入了Q-Sched，這是一種新的訓練後量化範式，它修改了擴散模型的調度器而不是模型權重。通過調整少步採樣軌跡，Q-Sched在模型大小減少4倍的情況下實現了全精度準確性。為了學習量化感知的預條件係數，我們提出了JAQ損失，它結合了文本圖像兼容性和圖像質量指標，以進行細粒度的優化。JAQ是無參考的，並且只需要少量的校準提示，避免了在校準期間進行全精度推理。Q-Sched帶來了顯著的增益：與FP16 4步潛在一致性模型相比，FID提高了15.5%，與FP16 8步階段一致性模型相比，提高了16.6%，表明量化和少步蒸餾對於高保真生成是互補的。一項大規模用戶研究，包含超過80,000個註釋，進一步證實了Q-Sched在FLUX.1[schnell]和SDXL-Turbo上的有效性。

English

Text-to-image diffusion models are computationally intensive, often requiring dozens of forward passes through large transformer backbones. For instance, Stable Diffusion XL generates high-quality images with 50 evaluations of a 2.6B-parameter model, an expensive process even for a single batch. Few-step diffusion models reduce this cost to 2-8 denoising steps but still depend on large, uncompressed U-Net or diffusion transformer backbones, which are often too costly for full-precision inference without datacenter GPUs. These requirements also limit existing post-training quantization methods that rely on full-precision calibration. We introduce Q-Sched, a new paradigm for post-training quantization that modifies the diffusion model scheduler rather than model weights. By adjusting the few-step sampling trajectory, Q-Sched achieves full-precision accuracy with a 4x reduction in model size. To learn quantization-aware pre-conditioning coefficients, we propose the JAQ loss, which combines text-image compatibility with an image quality metric for fine-grained optimization. JAQ is reference-free and requires only a handful of calibration prompts, avoiding full-precision inference during calibration. Q-Sched delivers substantial gains: a 15.5% FID improvement over the FP16 4-step Latent Consistency Model and a 16.6% improvement over the FP16 8-step Phased Consistency Model, showing that quantization and few-step distillation are complementary for high-fidelity generation. A large-scale user study with more than 80,000 annotations further confirms Q-Sched's effectiveness on both FLUX.1[schnell] and SDXL-Turbo.

Q-Sched：透過量化感知排程技術拓展少步數擴散模型的邊界

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

摘要

Support