ChatPaper.aiChatPaper

分階段DMD:基於子區間分數匹配的少步數分佈匹配蒸餾法 (注:DMD在此處為Distribution Matching Distillation的縮寫,採用學術界通用的「分佈匹配蒸餾」譯法,並通過括號注釋保持術語一致性)

Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

October 31, 2025
作者: Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, Lei Yang
cs.AI

摘要

分佈匹配蒸餾(DMD)可將基於分數的生成模型提煉為高效的一步生成器,且無需與教師模型的採樣軌跡保持一對一對應關係。然而,受限的模型容量導致一步蒸餾模型在複雜生成任務(例如文本到視頻生成中合成精細物體運動)上表現欠佳。直接將DMD擴展至多步蒸餾會增加記憶體使用量和計算深度,導致不穩定性和效率下降。雖然先前研究提出隨機梯度截斷作為潛在解決方案,但我們觀察到該方法會大幅降低多步蒸餾模型的生成多樣性,使其降至與一步蒸餾模型相當的水平。為解決這些限制,我們提出分階段DMD——一種融合分階段蒸餾與專家混合(MoE)思想的多步蒸餾框架,能在降低學習難度的同時提升模型容量。分階段DMD基於兩個核心思想:漸進式分佈匹配與子區間內的分數匹配。首先,我們的模型將信噪比範圍劃分為多個子區間,通過逐步向更高信噪比層級精煉模型,以更有效捕捉複雜分佈。其次,為確保每個子區間的訓練目標精確性,我們進行了嚴謹的數學推導。我們通過蒸餾尖端圖像與視頻生成模型(包括Qwen-Image(200億參數)和Wan2.2(280億參數))驗證分階段DMD的有效性。實驗結果表明,分階段DMD在保持關鍵生成能力的同時,能比DMD更好地保留輸出多樣性。我們將公開程式碼與模型。
English
Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectories of their teachers. However, limited model capacity causes one-step distilled models underperform on complex generative tasks, e.g., synthesizing intricate object motions in text-to-video generation. Directly extending DMD to multi-step distillation increases memory usage and computational depth, leading to instability and reduced efficiency. While prior works propose stochastic gradient truncation as a potential solution, we observe that it substantially reduces the generation diversity of multi-step distilled models, bringing it down to the level of their one-step counterparts. To address these limitations, we propose Phased DMD, a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts (MoE), reducing learning difficulty while enhancing model capacity. Phased DMD is built upon two key ideas: progressive distribution matching and score matching within subintervals. First, our model divides the SNR range into subintervals, progressively refining the model to higher SNR levels, to better capture complex distributions. Next, to ensure the training objective within each subinterval is accurate, we have conducted rigorous mathematical derivations. We validate Phased DMD by distilling state-of-the-art image and video generation models, including Qwen-Image (20B parameters) and Wan2.2 (28B parameters). Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities. We will release our code and models.
PDF231February 7, 2026