多步一致性模型

摘要

擴散模型相對容易訓練，但需要許多步驟來生成樣本。一致性模型則難以訓練，但可以在單一步驟中生成樣本。在本文中，我們提出多步一致性模型：將一致性模型（Song等，2023年）和TRACT（Berthelot等，2023年）統一起來，可以在一致性模型和擴散模型之間進行插值：在取樣速度和取樣質量之間取得平衡。具體來說，1步一致性模型是傳統的一致性模型，而我們展示了∞步一致性模型是擴散模型。多步一致性模型在實踐中表現非常出色。通過將樣本預算從單一步驟增加到2-8步，我們可以更輕鬆地訓練出生成更高質量樣本的模型，同時保留了很大部分的取樣速度優勢。值得注意的結果是在8步中Imagenet 64上達到1.4的FID，以及在8步中Imagenet128上達到2.1的FID並使用一致性蒸餾。我們還展示了我們的方法可以擴展到文本到圖像擴散模型，生成的樣本非常接近原始模型的質量。

English

Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas we show that a infty-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation. We also show that our method scales to a text-to-image diffusion model, generating samples that are very close to the quality of the original model.