缓存我如果你能：通过块缓存加速扩散模型

摘要

扩散模型最近在图像合成领域引起了革命，因为它们能够生成逼真的图像。然而，扩散模型的一个主要缺点是图像生成过程昂贵。需要多次应用大型图像到图像网络，以从随机噪声逐步优化图像。虽然许多最近的研究提出了减少所需步骤数量的技术，但它们通常将底层去噪网络视为黑匣子。在这项工作中，我们研究了网络内部层的行为，并发现：1）层的输出随时间平滑变化，2）层显示出不同的变化模式，3）从一步到另一步的变化通常非常小。我们假设去噪网络中许多层计算是多余的。利用这一点，我们引入了块缓存，通过重复使用先前步骤的层块输出来加快推断速度。此外，我们提出了一种基于每个块在时间步上的变化来自动确定缓存计划的技术。在我们的实验中，通过FID、人类评估和定性分析，我们展示了块缓存能够以相同的计算成本生成视觉质量更高的图像。我们针对不同的最先进模型（LDM和EMU）和求解器（DDIM和DPM）进行了演示。

English

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM).

缓存我如果你能：通过块缓存加速扩散模型

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

摘要

Support