ワンステップ拡散モデルのためのEM蒸留法

要旨

拡散モデルは複雑な分布を学習できる一方で、サンプリングには計算コストの高い反復プロセスが必要です。既存の蒸留手法は効率的なサンプリングを可能にしますが、サンプリングステップ数が非常に少ない場合の性能低下、学習データへの依存、または分布全体を捉えられない可能性のあるモード探索型の最適化など、顕著な制限があります。本研究では、拡散モデルを1ステップの生成モデルに最小限の知覚品質の損失で蒸留する、最尤推定に基づくEM蒸留（EMD）を提案します。本手法は、期待値最大化法（EM）の観点から導出され、拡散教師の事前分布と推論された生成モデルの潜在変数の結合分布からのサンプルを用いて生成モデルのパラメータを更新します。蒸留プロセスを安定化させるために、再パラメータ化されたサンプリングスキームとノイズキャンセレーション技術を開発しました。さらに、本手法と既存のモード探索型KL最小化手法との興味深い関連性を明らかにします。EMDは、ImageNet-64およびImageNet-128におけるFIDスコアにおいて、既存の1ステップ生成手法を上回り、テキストから画像への拡散モデルの蒸留に関する先行研究と比較しても良好な結果を示しています。

English

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.

ワンステップ拡散モデルのためのEM蒸留法

EM Distillation for One-step Diffusion Models

要旨

Support