ChatPaper.aiChatPaper

解耦DMD:以CFG增强为矛,以分布匹配为盾

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

November 27, 2025
作者: Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, Steven Hoi
cs.AI

摘要

扩散模型蒸馏已成为创建高效少步和单步生成器的强大技术。其中,分布匹配蒸馏(DMD)及其变体凭借卓越性能脱颖而出,传统观点普遍将其归因于学生模型输出分布与预训练教师模型匹配的核心机制。本研究对这一传统认知提出挑战。通过对DMD训练目标的严格解构,我们发现:在文本到图像生成等复杂任务中(通常需要CFG才能实现理想的少步生成性能),少步蒸馏的主要驱动力并非分布匹配,而是一个被长期忽视的组件——我们称之为CFG增强(CA)。研究证明该组件充当蒸馏过程的核心"引擎",而分布匹配(DM)项则作为"正则化器"确保训练稳定性并减少伪影。我们进一步验证了这种解耦关系:虽然DM项是高效的正则化器,但其作用并非不可替代;更简单的非参数约束或基于GAN的目标函数同样能实现稳定功能,尽管存在不同权衡。这种职责解耦促使我们对两项特性的原理进行更系统性分析,从而获得更深入的认知。基于新认知,我们提出了对蒸馏过程的原理性改进,例如对引擎项与正则化项采用解耦的噪声调度策略,从而进一步提升性能。值得关注的是,我们的方法已被Z-Image项目(https://github.com/Tongyi-MAI/Z-Image)采纳用于开发顶级8步图像生成模型,实证验证了本研究结论的普适性与鲁棒性。
English
Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their core mechanism of matching the student's output distribution to that of a pre-trained teacher model. In this work, we challenge this conventional understanding. Through a rigorous decomposition of the DMD training objective, we reveal that in complex tasks like text-to-image generation, where CFG is typically required for desirable few-step performance, the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA). We demonstrate that this term acts as the core ``engine'' of distillation, while the Distribution Matching (DM) term functions as a ``regularizer'' that ensures training stability and mitigates artifacts. We further validate this decoupling by demonstrating that while the DM term is a highly effective regularizer, it is not unique; simpler non-parametric constraints or GAN-based objectives can serve the same stabilizing function, albeit with different trade-offs. This decoupling of labor motivates a more principled analysis of the properties of both terms, leading to a more systematic and in-depth understanding. This new understanding further enables us to propose principled modifications to the distillation process, such as decoupling the noise schedules for the engine and the regularizer, leading to further performance gains. Notably, our method has been adopted by the Z-Image ( https://github.com/Tongyi-MAI/Z-Image ) project to develop a top-tier 8-step image generation model, empirically validating the generalization and robustness of our findings.
PDF111December 2, 2025