キャッシュできるならキャッシュしよう：ブロックキャッシュによる拡散モデルの高速化

要旨

拡散モデルは最近、フォトリアルな画像生成能力により画像合成の分野に革命をもたらしました。しかし、拡散モデルの主要な欠点の一つは、画像生成プロセスが高コストであることです。ランダムノイズから画像を反復的に精緻化するために、大規模な画像間ネットワークを何度も適用する必要があります。最近の多くの研究では、必要なステップ数を削減する技術が提案されていますが、それらは一般的に基礎となるノイズ除去ネットワークをブラックボックスとして扱っています。本研究では、ネットワーク内の層の挙動を調査し、1) 層の出力が時間とともに滑らかに変化すること、2) 層が明確な変化パターンを示すこと、3) ステップ間の変化が非常に小さいことが多いことを発見しました。私たちは、ノイズ除去ネットワークにおける多くの層計算が冗長であると仮説を立てました。これを活用して、前のステップの層ブロックの出力を再利用することで推論を高速化するブロックキャッシングを導入しました。さらに、各ブロックの時間ステップにわたる変化に基づいてキャッシングスケジュールを自動的に決定する技術を提案します。実験では、FID、人間による評価、および定性的分析を通じて、ブロックキャッシングが同じ計算コストでより高い視覚品質の画像を生成できることを示します。これを、異なる最先端モデル（LDMおよびEMU）およびソルバー（DDIMおよびDPM）で実証します。

English

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM).

キャッシュできるならキャッシュしよう：ブロックキャッシュによる拡散モデルの高速化

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

要旨

Support