条件付き拡散蒸留

要旨

生成拡散モデルは、テキストから画像生成のための強力な事前分布を提供し、画像編集、修復、超解像などの条件付き生成タスクの基盤として機能します。しかし、拡散モデルの主な制限の一つは、サンプリング時間が遅いことです。この課題に対処するため、我々は画像条件を活用して拡散事前分布を補完し、わずかなステップで条件付きサンプリングを可能にする新しい条件付き蒸留法を提案します。我々は、無条件事前学習を単一ステージで直接蒸留し、従来の蒸留と条件付き微調整を別々に行う二段階の手順を大幅に簡素化します。さらに、本手法は、共有された凍結された無条件バックボーンとわずかな追加パラメータのみを組み合わせて各タスクを蒸留する、新しいパラメータ効率の良い蒸留メカニズムを可能にします。超解像、画像編集、深度から画像生成を含む複数のタスクにわたる実験により、本手法が同じサンプリング時間において既存の蒸留技術を凌駕することが示されました。特に、本手法は、はるかに遅い微調整された条件付き拡散モデルの性能に匹敵する初めての蒸留戦略です。

English

Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

条件付き拡散蒸留

Conditional Diffusion Distillation

要旨

Support