ChatPaper.aiChatPaper

解耦式DMD:以CFG增強為矛,以分佈匹配為盾

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

November 27, 2025
作者: Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, Steven Hoi
cs.AI

摘要

扩散模型蒸餻已成為建構高效能少步數與單步生成器的關鍵技術。其中,分佈匹配蒸餻(DMD)及其變體憑藉卓越性能脫穎而出,學界普遍將其成功歸因於核心機制——使學生模型的輸出分佈與預訓練教師模型相匹配。本研究對這一傳統認知提出挑戰。通過對DMD訓練目標的嚴謹分解,我們發現在文本到圖像生成這類複雜任務中(通常需借助分類器引導以實現理想的少步數性能),驅動少步蒸餻的主因並非分佈匹配,而是我們新發現的、曾被忽視的「分類器引導增強」(CA)組件。我們證實該項實為蒸餻過程的核心「引擎」,而分佈匹配(DM)項則作為「正則化器」確保訓練穩定性並減少偽影。進一步驗證表明,儘管DM項是高效的正則化器,但其作用並非不可替代——更簡單的非參數約束或基於生成對抗網絡的目標函數同樣能實現穩定效果,僅在權衡取捨上有所不同。這種職能分離促使我們對兩項組件的特性進行更系統化的原理性分析,從而獲得更深層的認知。基於新認知,我們進一步提出對蒸餻過程的改進方案(如對引擎項與正則化項採用解耦的噪聲調度策略),實現了性能提升。值得關注的是,Z-Image項目(https://github.com/Tongxi-MAI/Z-Image)已採用本方法開發出頂尖的8步圖像生成模型,實證了本研究結論的普適性與魯棒性。
English
Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their core mechanism of matching the student's output distribution to that of a pre-trained teacher model. In this work, we challenge this conventional understanding. Through a rigorous decomposition of the DMD training objective, we reveal that in complex tasks like text-to-image generation, where CFG is typically required for desirable few-step performance, the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA). We demonstrate that this term acts as the core ``engine'' of distillation, while the Distribution Matching (DM) term functions as a ``regularizer'' that ensures training stability and mitigates artifacts. We further validate this decoupling by demonstrating that while the DM term is a highly effective regularizer, it is not unique; simpler non-parametric constraints or GAN-based objectives can serve the same stabilizing function, albeit with different trade-offs. This decoupling of labor motivates a more principled analysis of the properties of both terms, leading to a more systematic and in-depth understanding. This new understanding further enables us to propose principled modifications to the distillation process, such as decoupling the noise schedules for the engine and the regularizer, leading to further performance gains. Notably, our method has been adopted by the Z-Image ( https://github.com/Tongyi-MAI/Z-Image ) project to develop a top-tier 8-step image generation model, empirically validating the generalization and robustness of our findings.
PDF111December 2, 2025