SDXS：具有图像条件的实时一步潜扩散模型

摘要

最近扩散模型的进展使其成为图像生成的前沿。尽管扩散模型性能优越，但也存在一些缺点；它们具有复杂的架构和大量的计算需求，导致由于迭代采样过程而产生显著的延迟。为了缓解这些限制，我们引入了一种双重方法，包括模型小型化和减少采样步骤，旨在显著降低模型延迟。我们的方法利用知识蒸馏来简化U-Net和图像解码器的架构，并引入一种创新的一步DM训练技术，利用特征匹配和分数蒸馏。我们提出了两个模型，SDXS-512和SDXS-1024，在单个GPU上分别实现了约100 FPS的推理速度（比SD v1.5快30倍）和30 FP的速度（比SDXL快60倍）。此外，我们的训练方法在图像条件控制方面具有很好的应用前景，有助于实现高效的图像到图像的转换。

English

Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach involving model miniaturization and a reduction in sampling steps, aimed at significantly decreasing model latency. Our methodology leverages knowledge distillation to streamline the U-Net and image decoder architectures, and introduces an innovative one-step DM training technique that utilizes feature matching and score distillation. We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FP (60x faster than SDXL) on a single GPU, respectively. Moreover, our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.

SDXS：具有图像条件的实时一步潜扩散模型

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

摘要

Support