ChatPaper.aiChatPaper

Q-Sched:透過量化感知排程技術拓展少步數擴散模型的邊界

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

September 1, 2025
作者: Natalia Frumkin, Diana Marculescu
cs.AI

摘要

文本到圖像的擴散模型在計算上非常密集,通常需要通過大型Transformer骨幹進行數十次前向傳遞。例如,Stable Diffusion XL通過對一個26億參數的模型進行50次評估來生成高質量圖像,即使對於單一批次來說,這也是一個昂貴的過程。少步擴散模型將這一成本降低到2-8次去噪步驟,但仍然依賴於大型未壓縮的U-Net或擴散Transformer骨幹,這些模型在沒有數據中心GPU的情況下進行全精度推理通常成本過高。這些要求也限制了現有的依賴於全精度校準的訓練後量化方法。我們引入了Q-Sched,這是一種新的訓練後量化範式,它修改了擴散模型的調度器而不是模型權重。通過調整少步採樣軌跡,Q-Sched在模型大小減少4倍的情況下實現了全精度準確性。為了學習量化感知的預條件係數,我們提出了JAQ損失,它結合了文本圖像兼容性和圖像質量指標,以進行細粒度的優化。JAQ是無參考的,並且只需要少量的校準提示,避免了在校準期間進行全精度推理。Q-Sched帶來了顯著的增益:與FP16 4步潛在一致性模型相比,FID提高了15.5%,與FP16 8步階段一致性模型相比,提高了16.6%,表明量化和少步蒸餾對於高保真生成是互補的。一項大規模用戶研究,包含超過80,000個註釋,進一步證實了Q-Sched在FLUX.1[schnell]和SDXL-Turbo上的有效性。
English
Text-to-image diffusion models are computationally intensive, often requiring dozens of forward passes through large transformer backbones. For instance, Stable Diffusion XL generates high-quality images with 50 evaluations of a 2.6B-parameter model, an expensive process even for a single batch. Few-step diffusion models reduce this cost to 2-8 denoising steps but still depend on large, uncompressed U-Net or diffusion transformer backbones, which are often too costly for full-precision inference without datacenter GPUs. These requirements also limit existing post-training quantization methods that rely on full-precision calibration. We introduce Q-Sched, a new paradigm for post-training quantization that modifies the diffusion model scheduler rather than model weights. By adjusting the few-step sampling trajectory, Q-Sched achieves full-precision accuracy with a 4x reduction in model size. To learn quantization-aware pre-conditioning coefficients, we propose the JAQ loss, which combines text-image compatibility with an image quality metric for fine-grained optimization. JAQ is reference-free and requires only a handful of calibration prompts, avoiding full-precision inference during calibration. Q-Sched delivers substantial gains: a 15.5% FID improvement over the FP16 4-step Latent Consistency Model and a 16.6% improvement over the FP16 8-step Phased Consistency Model, showing that quantization and few-step distillation are complementary for high-fidelity generation. A large-scale user study with more than 80,000 annotations further confirms Q-Sched's effectiveness on both FLUX.1[schnell] and SDXL-Turbo.
PDF62September 10, 2025