BOOT：使用自举法对无数据去噪扩散模型进行蒸馏

摘要

扩散模型展现出生成多样图像的巨大潜力。然而，由于迭代去噪，它们的性能常常受到生成速度缓慢的影响。最近提出的知识蒸馏被认为是一种可以将推断步骤减少到一步或几步而不会显著降低质量的补救方法。然而，现有的蒸馏方法要么需要大量离线计算以从教师模型生成合成训练数据，要么需要借助真实数据进行昂贵的在线学习。在这项工作中，我们提出了一种名为BOOT的新技术，通过高效的无数据蒸馏算法克服了这些限制。其核心思想是学习一个时间条件模型，可以预测经过预训练的扩散模型教师在任何时间步的输出。这样的模型可以通过从两个连续采样步骤进行自举有效地进行训练。此外，我们的方法可以轻松适应大规模文本到图像扩散模型，这对传统方法来说具有挑战性，因为训练集通常庞大且难以访问。我们在DDIM设置中的几个基准数据集上展示了我们方法的有效性，实现了与扩散教师相比可比的生成质量，同时生成速度比扩散教师快几个数量级。文本到图像的结果表明，所提出的方法能够处理高度复杂的分布，为更高效的生成建模提供了启示。

English

Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.

BOOT：使用自举法对无数据去噪扩散模型进行蒸馏

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

摘要

Support