SDXS:具備圖像條件的實時一步隱潛擴散模型
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
March 25, 2024
作者: Yuda Song, Zehao Sun, Xuanwu Yin
cs.AI
摘要
最近擴散模型的進步使其成為影像生成的領先者。儘管其表現優越,擴散模型並非沒有缺點;其特點是複雜的架構和龐大的計算需求,導致由於其迭代採樣過程而產生顯著的延遲。為了減輕這些限制,我們提出了一種雙重方法,包括模型小型化和採樣步驟減少,旨在顯著降低模型延遲。我們的方法利用知識蒸餾來簡化 U-Net 和影像解碼器架構,並引入一種創新的一步 DM 訓練技術,利用特徵匹配和分數蒸餾。我們提出了兩個模型,SDXS-512 和 SDXS-1024,在單個 GPU 上實現約 100 FPS 的推理速度(比 SD v1.5 快 30 倍)和 30 FP 的速度(比 SDXL 快 60 倍),此外,我們的訓練方法在基於影像的控制中具有應用前景,促進高效的影像到影像的轉換。
English
Recent advancements in diffusion models have positioned them at the forefront
of image generation. Despite their superior performance, diffusion models are
not without drawbacks; they are characterized by complex architectures and
substantial computational demands, resulting in significant latency due to
their iterative sampling process. To mitigate these limitations, we introduce a
dual approach involving model miniaturization and a reduction in sampling
steps, aimed at significantly decreasing model latency. Our methodology
leverages knowledge distillation to streamline the U-Net and image decoder
architectures, and introduces an innovative one-step DM training technique that
utilizes feature matching and score distillation. We present two models,
SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS
(30x faster than SD v1.5) and 30 FP (60x faster than SDXL) on a single GPU,
respectively. Moreover, our training approach offers promising applications in
image-conditioned control, facilitating efficient image-to-image translation.Summary
AI-Generated Summary