LCM-LoRA：一個通用的穩定擴散加速模組

摘要

潛在一致性模型（LCMs）在加速文本到圖像生成任務方面取得了令人印象深刻的表現，能夠以最少的推論步驟生成高質量圖像。LCMs是從預訓練的潛在擴散模型（LDMs）中提煉出來的，僅需約32個A100 GPU訓練小時。本報告進一步擴展了LCMs的潛力，具體表現在兩個方面：首先，通過將LoRA蒸餾應用於包括SD-V1.5、SSD-1B和SDXL在內的Stable-Diffusion模型，我們將LCM的範圍擴展到具有顯著較少內存消耗的更大模型，實現了更優秀的圖像生成質量。其次，我們將通過LCM蒸餾獲得的LoRA參數識別為一個通用的Stable-Diffusion加速模塊，命名為LCM-LoRA。LCM-LoRA可以直接插入各種Stable-Diffusion微調模型或LoRAs，而無需進行訓練，因此代表了一個適用於多樣化圖像生成任務的通用加速器。與先前的數值PF-ODE求解器（如DDIM、DPM-Solver）相比，LCM-LoRA可以被視為一個插件神經PF-ODE求解器，具有強大的泛化能力。項目頁面：https://github.com/luosiallen/latent-consistency-model。

English

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

LCM-LoRA：一個通用的穩定擴散加速模組

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

摘要

Support