多样性保持的分布匹配蒸馏技术实现快速视觉合成
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
February 3, 2026
作者: Tianhe Wu, Ruibin Li, Lei Zhang, Kede Ma
cs.AI
摘要
分布匹配蒸馏(DMD)通过将多步生成器与其少步对应模型对齐,实现在低推理成本下的高质量生成。然而,DMD易出现模式崩溃问题,因其反向KL散度公式天然倾向于模式聚焦行为。现有改进方法通常依赖感知或对抗正则化,但会带来显著的计算开销与训练不稳定性。本研究提出一种角色分离蒸馏框架,显式解耦蒸馏步骤的职能:首步通过目标预测(如v预测)目标专注于保持样本多样性,后续步骤则在标准DMD损失下聚焦质量优化,且DMD目标在首步的梯度被阻断。我们将该方法命名为多样性保持型DMD(DP-DMD)。尽管方案简洁——无需感知主干网络、判别器、辅助网络或额外真实图像——该方案在大量文生图实验中既能保持样本多样性,其视觉质量也与最先进方法持平。
English
Distribution matching distillation (DMD) aligns a multi-step generator with its few-step counterpart to enable high-quality generation under low inference cost. However, DMD tends to suffer from mode collapse, as its reverse-KL formulation inherently encourages mode-seeking behavior, for which existing remedies typically rely on perceptual or adversarial regularization, thereby incurring substantial computational overhead and training instability. In this work, we propose a role-separated distillation framework that explicitly disentangles the roles of distilled steps: the first step is dedicated to preserving sample diversity via a target-prediction (e.g., v-prediction) objective, while subsequent steps focus on quality refinement under the standard DMD loss, with gradients from the DMD objective blocked at the first step. We term this approach Diversity-Preserved DMD (DP-DMD), which, despite its simplicity -- no perceptual backbone, no discriminator, no auxiliary networks, and no additional ground-truth images -- preserves sample diversity while maintaining visual quality on par with state-of-the-art methods in extensive text-to-image experiments.