EM蒸餾用於一步驟擴散模型

摘要

儘管擴散模型能夠學習複雜的分佈，但取樣需要計算昂貴的迭代過程。現有的蒸餾方法能夠實現高效的取樣，但存在顯著的限制，例如在非常少的取樣步驟下性能下降、依賴訓練數據訪問，或者尋找模式的優化可能無法捕捉完整的分佈。我們提出 EM 蒸餾（EMD），這是一種基於最大概似的方法，將擴散模型提煉為一步生成器模型，並最小化感知質量損失。我們的方法是通過期望最大化（EM）的視角推導出來的，其中生成器參數是使用從擴散教師先驗和推斷生成器潛在變數的聯合分佈中獲取的樣本來更新的。我們開發了一種重新參數化的取樣方案和一種噪聲抵消技術，共同穩定了蒸餾過程。我們進一步揭示了我們的方法與現有最小化尋找模式 KL 的方法之間的有趣關係。EMD 在 ImageNet-64 和 ImageNet-128 的 FID 分數方面優於現有的一步生成方法，並且在提煉文本到圖像擴散模型方面與先前的工作相比表現出色。

English

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.

EM蒸餾用於一步驟擴散模型

EM Distillation for One-step Diffusion Models

摘要

Support