多步一致性模型

摘要

扩散模型相对容易训练，但生成样本需要许多步骤。一致性模型要难训练得多，但可以在单个步骤中生成样本。在本文中，我们提出了多步一致性模型：将一致性模型（Song等，2023年）和TRACT（Berthelot等，2023年）统一起来，可以在一致性模型和扩散模型之间进行插值：在采样速度和采样质量之间取得平衡。具体而言，1步一致性模型是传统的一致性模型，而我们展示了∞步一致性模型是扩散模型。多步一致性模型在实践中表现非常出色。通过将样本预算从单步增加到2-8步，我们可以更轻松地训练出生成更高质量样本的模型，同时保留大部分采样速度优势。显著的结果是在8步中在Imagenet 64上达到1.4 FID，在8步中在Imagenet128上达到2.1 FID，同时使用一致性蒸馏。我们还展示了我们的方法可扩展到文本到图像扩散模型，生成的样本质量非常接近原始模型的质量。

English

Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas we show that a infty-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation. We also show that our method scales to a text-to-image diffusion model, generating samples that are very close to the quality of the original model.