ChatPaper.aiChatPaper

多樣性保持分佈匹配蒸餾技術實現快速視覺合成

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

February 3, 2026
作者: Tianhe Wu, Ruibin Li, Lei Zhang, Kede Ma
cs.AI

摘要

分佈匹配蒸餾(DMD)通過對齊多步生成器與其少步對應模型,實現了低推理成本下的高質量生成。然而,DMD容易陷入模式崩塌,因為其反向KL散度公式本質上會激發模式尋求行為。現有解決方案通常依賴感知或對抗正則化,但這會帶來大量計算開銷和訓練不穩定性。本文提出一種角色分離的蒸餾框架,明確區分蒸餾步驟的職能:第一步通過目標預測(如v-prediction)目標專注於保持樣本多樣性,後續步驟則在標準DMD損失下專注於質量提升,同時在第一步阻斷DMD目標的梯度回傳。我們將此方法稱為多樣性保持型DMD(DP-DMD)。該方法儘管設計簡潔——無需感知骨幹網絡、判別器、輔助網絡或額外真實圖像——卻能在大量文本到圖像實驗中保持樣本多樣性,並達到與前沿方法相當的視覺質量。
English
Distribution matching distillation (DMD) aligns a multi-step generator with its few-step counterpart to enable high-quality generation under low inference cost. However, DMD tends to suffer from mode collapse, as its reverse-KL formulation inherently encourages mode-seeking behavior, for which existing remedies typically rely on perceptual or adversarial regularization, thereby incurring substantial computational overhead and training instability. In this work, we propose a role-separated distillation framework that explicitly disentangles the roles of distilled steps: the first step is dedicated to preserving sample diversity via a target-prediction (e.g., v-prediction) objective, while subsequent steps focus on quality refinement under the standard DMD loss, with gradients from the DMD objective blocked at the first step. We term this approach Diversity-Preserved DMD (DP-DMD), which, despite its simplicity -- no perceptual backbone, no discriminator, no auxiliary networks, and no additional ground-truth images -- preserves sample diversity while maintaining visual quality on par with state-of-the-art methods in extensive text-to-image experiments.
PDF312February 5, 2026