SDXS:具有图像条件的实时一步潜扩散模型
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
March 25, 2024
作者: Yuda Song, Zehao Sun, Xuanwu Yin
cs.AI
摘要
最近扩散模型的进展使其成为图像生成的前沿。尽管扩散模型性能优越,但也存在一些缺点;它们具有复杂的架构和大量的计算需求,导致由于迭代采样过程而产生显著的延迟。为了缓解这些限制,我们引入了一种双重方法,包括模型小型化和减少采样步骤,旨在显著降低模型延迟。我们的方法利用知识蒸馏来简化U-Net和图像解码器的架构,并引入一种创新的一步DM训练技术,利用特征匹配和分数蒸馏。我们提出了两个模型,SDXS-512和SDXS-1024,在单个GPU上分别实现了约100 FPS的推理速度(比SD v1.5快30倍)和30 FP的速度(比SDXL快60倍)。此外,我们的训练方法在图像条件控制方面具有很好的应用前景,有助于实现高效的图像到图像的转换。
English
Recent advancements in diffusion models have positioned them at the forefront
of image generation. Despite their superior performance, diffusion models are
not without drawbacks; they are characterized by complex architectures and
substantial computational demands, resulting in significant latency due to
their iterative sampling process. To mitigate these limitations, we introduce a
dual approach involving model miniaturization and a reduction in sampling
steps, aimed at significantly decreasing model latency. Our methodology
leverages knowledge distillation to streamline the U-Net and image decoder
architectures, and introduces an innovative one-step DM training technique that
utilizes feature matching and score distillation. We present two models,
SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS
(30x faster than SD v1.5) and 30 FP (60x faster than SDXL) on a single GPU,
respectively. Moreover, our training approach offers promising applications in
image-conditioned control, facilitating efficient image-to-image translation.Summary
AI-Generated Summary