ChatPaper.aiChatPaper

BOOT:使用自助法對無數據訓練的去噪擴散模型進行蒸餾

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

June 8, 2023
作者: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
cs.AI

摘要

擴散模型展現出生成多樣圖像的優異潛力。然而,由於迭代去噪,它們的性能常常受到生成速度緩慢的影響。最近提出的知識蒸餾被認為是一種解決方案,可以將推論步驟的數量減少到一個或幾個,而不會出現顯著的質量降級。然而,現有的蒸餾方法要麼需要大量的離線計算來從教師模型生成合成訓練數據,要麼需要在真實數據的幫助下進行昂貴的在線學習。在這項工作中,我們提出了一種名為BOOT的新技術,通過一種高效的無數據蒸餾算法克服了這些限制。其核心思想是學習一個時間條件模型,該模型可以預測在任何時間步長給定的預先訓練的擴散模型教師的輸出。這樣的模型可以通過從兩個連續抽樣步驟進行自助法訓練。此外,我們的方法可以輕鬆適應大規模文本到圖像擴散模型,這對於傳統方法來說是具有挑戰性的,因為訓練集通常龐大且難以訪問。我們在DDIM設置中的幾個基準數據集上展示了我們方法的有效性,實現了與擴散教師相比可比的生成質量,同時生成速度比擴散教師快了數個數量級。文本到圖像的結果表明,所提出的方法能夠處理高度複雜的分佈,為更高效的生成建模提供了新的思路。
English
Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.
PDF101December 15, 2024