MLCM:多步一致性蒸餾潛在擴散模型
MLCM: Multistep Consistency Distillation of Latent Diffusion Model
June 9, 2024
作者: Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen chen, Shixiang Tang, Haonan Lu
cs.AI
摘要
將大型潛在擴散模型(LDMs)提煉成易於抽樣的模型正吸引越來越多的研究興趣。然而,現有方法大多面臨一個困境,要麼(i)依賴於多個個別提煉模型以滿足不同的抽樣預算,要麼(ii)犧牲生成質量,僅使用有限(例如2-4)和/或中等(例如5-8)的抽樣步驟。為了應對這些問題,我們將最近的多步一致性提煉(MCD)策略擴展到具有代表性的LDMs,建立了多步潛在一致性模型(MLCMs)方法,用於低成本高質量的圖像合成。MLCM作為統一模型,適用於各種抽樣步驟,這是由於MCD的潛力。我們進一步通過一個漸進式訓練策略來增強MCD,以加強分段間的一致性,從而提升少步生成的質量。我們將教師模型的抽樣軌跡狀態作為MLCM的訓練數據,以提高高質量訓練數據的要求,並彌合提煉模型的訓練和推斷之間的差距。MLCM與偏好學習策略兼容,以進一步提高視覺質量和美感吸引力。從實證角度看,MLCM可以僅使用2-8個抽樣步驟生成高質量、令人愉悅的圖像。在MSCOCO-2017 5K基準測試中,從SDXL提煉的MLCM在僅使用4個步驟時獲得了33.30的CLIP分數、6.19的美學分數和1.20的圖像獎勵,遠遠超過了4步LCM [23]、8步SDXL-Lightning [17]和8步HyperSD [33]。我們還展示了MLCM在包括可控生成、圖像風格轉移和中文到圖像生成等應用中的多功能性。
English
Distilling large latent diffusion models (LDMs) into ones that are fast to
sample from is attracting growing research interest. However, the majority of
existing methods face a dilemma where they either (i) depend on multiple
individual distilled models for different sampling budgets, or (ii) sacrifice
generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8)
sampling steps. To address these, we extend the recent multistep consistency
distillation (MCD) strategy to representative LDMs, establishing the Multistep
Latent Consistency Models (MLCMs) approach for low-cost high-quality image
synthesis. MLCM serves as a unified model for various sampling steps due to the
promise of MCD. We further augment MCD with a progressive training strategy to
strengthen inter-segment consistency to boost the quality of few-step
generations. We take the states from the sampling trajectories of the teacher
model as training data for MLCMs to lift the requirements for high-quality
training datasets and to bridge the gap between the training and inference of
the distilled model. MLCM is compatible with preference learning strategies for
further improvement of visual quality and aesthetic appeal. Empirically, MLCM
can generate high-quality, delightful images with only 2-8 sampling steps. On
the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of
33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps,
substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and
8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in
applications including controllable generation, image style transfer, and
Chinese-to-image generation.Summary
AI-Generated Summary