快取我如果你能：透過區塊快取加速擴散模型

摘要

擴散模型最近在影像合成領域引起了革命，因為它們能夠生成逼真的影像。然而，擴散模型的一個主要缺點是影像生成過程耗費巨大。需要多次應用大型影像對影像網絡來從隨機噪音逐步精煉影像。雖然許多最近的研究提出了減少所需步驟數的技術，但通常將底層去噪網絡視為黑盒子。在這項研究中，我們調查了網絡內部層的行為，發現 1) 層的輸出隨時間平滑變化，2) 層展示出不同的變化模式，以及 3) 一步到另一步的變化通常非常小。我們假設去噪網絡中的許多層計算是多餘的。利用這一點，我們引入了區塊緩存，通過重複使用先前步驟的層區塊的輸出來加速推理。此外，我們提出了一種基於每個區塊隨時間變化的技術，自動確定緩存計劃的技術。在我們的實驗中，我們通過FID、人工評估和定性分析展示了區塊緩存能夠以相同的計算成本生成視覺品質更高的影像。我們對不同最新模型（LDM 和 EMU）和解決方案（DDIM 和 DPM）進行了演示。

English

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM).

快取我如果你能：透過區塊快取加速擴散模型

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

摘要

Support