BOOT:使用自助法對無數據訓練的去噪擴散模型進行蒸餾
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
June 8, 2023
作者: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
cs.AI
摘要
擴散模型展現出生成多樣圖像的優異潛力。然而,由於迭代去噪,它們的性能常常受到生成速度緩慢的影響。最近提出的知識蒸餾被認為是一種解決方案,可以將推論步驟的數量減少到一個或幾個,而不會出現顯著的質量降級。然而,現有的蒸餾方法要麼需要大量的離線計算來從教師模型生成合成訓練數據,要麼需要在真實數據的幫助下進行昂貴的在線學習。在這項工作中,我們提出了一種名為BOOT的新技術,通過一種高效的無數據蒸餾算法克服了這些限制。其核心思想是學習一個時間條件模型,該模型可以預測在任何時間步長給定的預先訓練的擴散模型教師的輸出。這樣的模型可以通過從兩個連續抽樣步驟進行自助法訓練。此外,我們的方法可以輕鬆適應大規模文本到圖像擴散模型,這對於傳統方法來說是具有挑戰性的,因為訓練集通常龐大且難以訪問。我們在DDIM設置中的幾個基準數據集上展示了我們方法的有效性,實現了與擴散教師相比可比的生成質量,同時生成速度比擴散教師快了數個數量級。文本到圖像的結果表明,所提出的方法能夠處理高度複雜的分佈,為更高效的生成建模提供了新的思路。
English
Diffusion models have demonstrated excellent potential for generating diverse
images. However, their performance often suffers from slow generation due to
iterative denoising. Knowledge distillation has been recently proposed as a
remedy that can reduce the number of inference steps to one or a few without
significant quality degradation. However, existing distillation methods either
require significant amounts of offline computation for generating synthetic
training data from the teacher model or need to perform expensive online
learning with the help of real data. In this work, we present a novel technique
called BOOT, that overcomes these limitations with an efficient data-free
distillation algorithm. The core idea is to learn a time-conditioned model that
predicts the output of a pre-trained diffusion model teacher given any time
step. Such a model can be efficiently trained based on bootstrapping from two
consecutive sampled steps. Furthermore, our method can be easily adapted to
large-scale text-to-image diffusion models, which are challenging for
conventional methods given the fact that the training sets are often large and
difficult to access. We demonstrate the effectiveness of our approach on
several benchmark datasets in the DDIM setting, achieving comparable generation
quality while being orders of magnitude faster than the diffusion teacher. The
text-to-image results show that the proposed approach is able to handle highly
complex distributions, shedding light on more efficient generative modeling.