BOOT：ブートストラップを用いたノイズ除去拡散モデルのデータフリーディスティレーション

要旨

拡散モデルは多様な画像生成において優れた可能性を示しています。しかし、反復的なノイズ除去プロセスのため、生成速度が遅いという課題があります。最近、知識蒸留がこの問題に対する解決策として提案され、推論ステップ数を1回または数回に削減しながらも品質の大幅な低下を防ぐことが可能となりました。しかし、既存の蒸留手法では、教師モデルから合成トレーニングデータを生成するために大量のオフライン計算が必要か、あるいは実データを用いた高コストなオンライン学習を必要とします。本研究では、これらの制約を克服する効率的なデータフリー蒸留アルゴリズムであるBOOTを提案します。その核となるアイデアは、任意のタイムステップにおいて事前学習済みの拡散モデル教師の出力を予測する時間条件付きモデルを学習することです。このモデルは、連続する2つのサンプリングステップからのブートストラップに基づいて効率的に学習できます。さらに、本手法は大規模なテキストから画像への拡散モデルにも容易に適用可能であり、従来の手法では困難であった大規模でアクセスが難しいトレーニングセットに対しても有効です。DDIM設定における複数のベンチマークデータセットで本手法の有効性を実証し、拡散教師モデルと同等の生成品質を維持しながらも桁違いに高速な生成を実現しました。テキストから画像への生成結果は、本手法が高度に複雑な分布を扱えることを示しており、より効率的な生成モデリングへの道を開くものです。

English

Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.

BOOT：ブートストラップを用いたノイズ除去拡散モデルのデータフリーディスティレーション

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

要旨

Support