ChatPaper.aiChatPaper

一步扩散模型与f-散度分布匹配

One-step Diffusion Models with f-Divergence Distribution Matching

February 21, 2025
作者: Yilun Xu, Weili Nie, Arash Vahdat
cs.AI

摘要

從擴散模型中採樣涉及一個緩慢的迭代過程,這阻礙了其實際部署,尤其是在互動應用中。為了加速生成速度,最近的方法通過變分分數蒸餾將多步擴散模型蒸餾成單步學生生成器,使學生生成的樣本分佈與教師的分佈相匹配。然而,這些方法使用反向Kullback-Leibler(KL)散度進行分佈匹配,而眾所周知,這種散度具有模式尋求的特性。在本文中,我們使用一種新穎的f-散度最小化框架(稱為f-distill)來推廣分佈匹配方法,該框架涵蓋了不同散度,並在模式覆蓋和訓練方差之間提供了不同的權衡。我們推導了教師和學生分佈之間f-散度的梯度,並表明它表示為它們分數差異與由它們密度比決定的加權函數的乘積。當使用較少模式尋求的散度時,這個加權函數自然會強調教師分佈中密度較高的樣本。我們觀察到,使用反向KL散度的流行變分分數蒸餾方法是我們框架中的一個特例。實證上,我們證明替代的f-散度,如正向KL和Jensen-Shannon散度,在圖像生成任務中優於當前最佳的變分分數蒸餾方法。特別是,當使用Jensen-Shannon散度時,f-distill在ImageNet64上實現了當前最先進的一步生成性能,並在MS-COCO上實現了零樣本文本到圖像生成。項目頁面:https://research.nvidia.com/labs/genair/f-distill
English
Sampling from diffusion models involves a slow iterative process that hinders their practical deployment, especially for interactive applications. To accelerate generation speed, recent approaches distill a multi-step diffusion model into a single-step student generator via variational score distillation, which matches the distribution of samples generated by the student to the teacher's distribution. However, these approaches use the reverse Kullback-Leibler (KL) divergence for distribution matching which is known to be mode seeking. In this paper, we generalize the distribution matching approach using a novel f-divergence minimization framework, termed f-distill, that covers different divergences with different trade-offs in terms of mode coverage and training variance. We derive the gradient of the f-divergence between the teacher and student distributions and show that it is expressed as the product of their score differences and a weighting function determined by their density ratio. This weighting function naturally emphasizes samples with higher density in the teacher distribution, when using a less mode-seeking divergence. We observe that the popular variational score distillation approach using the reverse-KL divergence is a special case within our framework. Empirically, we demonstrate that alternative f-divergences, such as forward-KL and Jensen-Shannon divergences, outperform the current best variational score distillation methods across image generation tasks. In particular, when using Jensen-Shannon divergence, f-distill achieves current state-of-the-art one-step generation performance on ImageNet64 and zero-shot text-to-image generation on MS-COCO. Project page: https://research.nvidia.com/labs/genair/f-distill

Summary

AI-Generated Summary

PDF72February 24, 2025