利用潛在對抗擴散蒸餾實現快速高解析度圖像合成。
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
March 18, 2024
作者: Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach
cs.AI
摘要
擴散模型是影像和影片合成進展的主要推動力,但受限於推論速度緩慢。蒸餾方法,如最近引入的對抗性擴散蒸餾(ADD),旨在將模型從多步驟推論轉變為單步驟,儘管這將帶來昂貴且困難的優化成本,因為它依賴於固定預訓練的 DINOv2 判別器。我們提出了潛在對抗性擴散蒸餾(LADD),這是一種克服 ADD 限制的新型蒸餾方法。與基於像素的 ADD 不同,LADD 利用預訓練的潛在擴散模型的生成特徵。這種方法簡化了訓練並增強了性能,實現了高解析度多方位比例的影像合成。我們將 LADD 應用於 Stable Diffusion 3(8B)以獲得 SD3-Turbo,這是一個快速模型,僅使用四個未引導的採樣步驟即可匹配最先進的文本到影像生成器的性能。此外,我們系統地研究了其擴展行為,並展示了 LADD 在各種應用中的有效性,如影像編輯和修補。
English
Diffusion models are the main driver of progress in image and video
synthesis, but suffer from slow inference speed. Distillation methods, like the
recently introduced adversarial diffusion distillation (ADD) aim to shift the
model from many-shot to single-step inference, albeit at the cost of expensive
and difficult optimization due to its reliance on a fixed pretrained DINOv2
discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a
novel distillation approach overcoming the limitations of ADD. In contrast to
pixel-based ADD, LADD utilizes generative features from pretrained latent
diffusion models. This approach simplifies training and enhances performance,
enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to
Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the
performance of state-of-the-art text-to-image generators using only four
unguided sampling steps. Moreover, we systematically investigate its scaling
behavior and demonstrate LADD's effectiveness in various applications such as
image editing and inpainting.Summary
AI-Generated Summary