MLCM：多步一致性蒸馏潜在扩散模型

摘要

将大型潜在扩散模型（LDMs）提炼成便于快速采样的模型正吸引着越来越多的研究兴趣。然而，现有方法大多面临一个困境，要么（i）依赖于多个针对不同采样预算的个别提炼模型，要么（ii）在有限的（例如2-4）和/或中等的（例如5-8）采样步骤下牺牲生成质量。为了解决这些问题，我们将最近的多步一致性提炼（MCD）策略扩展到代表性的LDMs，建立了适用于低成本高质量图像合成的多步潜在一致性模型（MLCMs）方法。MLCM作为一个统一模型适用于各种采样步骤，这得益于MCD的潜力。我们进一步采用渐进式训练策略来增强分段间的一致性，以提升少步生成的质量。我们将教师模型采样轨迹的状态作为MLCMs的训练数据，以提高高质量训练数据的要求，并弥合提炼模型的训练和推断之间的差距。MLCM与偏好学习策略兼容，以进一步提升视觉质量和美学吸引力。从经验上看，MLCM可以在仅2-8个采样步骤下生成高质量、令人愉悦的图像。在MSCOCO-2017 5K基准测试中，从SDXL提炼的MLCM在仅4步的情况下获得了33.30的CLIP分数，6.19的美学分数和1.20的图像奖励，大大超越了4步LCM [23]、8步SDXL-Lightning [17]和8步HyperSD [33]。我们还展示了MLCM在可控生成、图像风格转移和中文到图像生成等应用中的多功能性。

English

Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To address these, we extend the recent multistep consistency distillation (MCD) strategy to representative LDMs, establishing the Multistep Latent Consistency Models (MLCMs) approach for low-cost high-quality image synthesis. MLCM serves as a unified model for various sampling steps due to the promise of MCD. We further augment MCD with a progressive training strategy to strengthen inter-segment consistency to boost the quality of few-step generations. We take the states from the sampling trajectories of the teacher model as training data for MLCMs to lift the requirements for high-quality training datasets and to bridge the gap between the training and inference of the distilled model. MLCM is compatible with preference learning strategies for further improvement of visual quality and aesthetic appeal. Empirically, MLCM can generate high-quality, delightful images with only 2-8 sampling steps. On the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of 33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps, substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and 8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in applications including controllable generation, image style transfer, and Chinese-to-image generation.

MLCM：多步一致性蒸馏潜在扩散模型

MLCM: Multistep Consistency Distillation of Latent Diffusion Model

摘要

Support