MLCM:多步一致性蒸馏潜在扩散模型
MLCM: Multistep Consistency Distillation of Latent Diffusion Model
June 9, 2024
作者: Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen chen, Shixiang Tang, Haonan Lu
cs.AI
摘要
将大型潜在扩散模型(LDMs)提炼成便于快速采样的模型正吸引着越来越多的研究兴趣。然而,现有方法大多面临一个困境,要么(i)依赖于多个针对不同采样预算的个别提炼模型,要么(ii)在有限的(例如2-4)和/或中等的(例如5-8)采样步骤下牺牲生成质量。为了解决这些问题,我们将最近的多步一致性提炼(MCD)策略扩展到代表性的LDMs,建立了适用于低成本高质量图像合成的多步潜在一致性模型(MLCMs)方法。MLCM作为一个统一模型适用于各种采样步骤,这得益于MCD的潜力。我们进一步采用渐进式训练策略来增强分段间的一致性,以提升少步生成的质量。我们将教师模型采样轨迹的状态作为MLCMs的训练数据,以提高高质量训练数据的要求,并弥合提炼模型的训练和推断之间的差距。MLCM与偏好学习策略兼容,以进一步提升视觉质量和美学吸引力。从经验上看,MLCM可以在仅2-8个采样步骤下生成高质量、令人愉悦的图像。在MSCOCO-2017 5K基准测试中,从SDXL提炼的MLCM在仅4步的情况下获得了33.30的CLIP分数,6.19的美学分数和1.20的图像奖励,大大超越了4步LCM [23]、8步SDXL-Lightning [17]和8步HyperSD [33]。我们还展示了MLCM在可控生成、图像风格转移和中文到图像生成等应用中的多功能性。
English
Distilling large latent diffusion models (LDMs) into ones that are fast to
sample from is attracting growing research interest. However, the majority of
existing methods face a dilemma where they either (i) depend on multiple
individual distilled models for different sampling budgets, or (ii) sacrifice
generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8)
sampling steps. To address these, we extend the recent multistep consistency
distillation (MCD) strategy to representative LDMs, establishing the Multistep
Latent Consistency Models (MLCMs) approach for low-cost high-quality image
synthesis. MLCM serves as a unified model for various sampling steps due to the
promise of MCD. We further augment MCD with a progressive training strategy to
strengthen inter-segment consistency to boost the quality of few-step
generations. We take the states from the sampling trajectories of the teacher
model as training data for MLCMs to lift the requirements for high-quality
training datasets and to bridge the gap between the training and inference of
the distilled model. MLCM is compatible with preference learning strategies for
further improvement of visual quality and aesthetic appeal. Empirically, MLCM
can generate high-quality, delightful images with only 2-8 sampling steps. On
the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of
33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps,
substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and
8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in
applications including controllable generation, image style transfer, and
Chinese-to-image generation.Summary
AI-Generated Summary