利用潛在對抗擴散蒸餾實現快速高解析度圖像合成。

摘要

擴散模型是影像和影片合成進展的主要推動力，但受限於推論速度緩慢。蒸餾方法，如最近引入的對抗性擴散蒸餾（ADD），旨在將模型從多步驟推論轉變為單步驟，儘管這將帶來昂貴且困難的優化成本，因為它依賴於固定預訓練的 DINOv2 判別器。我們提出了潛在對抗性擴散蒸餾（LADD），這是一種克服 ADD 限制的新型蒸餾方法。與基於像素的 ADD 不同，LADD 利用預訓練的潛在擴散模型的生成特徵。這種方法簡化了訓練並增強了性能，實現了高解析度多方位比例的影像合成。我們將 LADD 應用於 Stable Diffusion 3（8B）以獲得 SD3-Turbo，這是一個快速模型，僅使用四個未引導的採樣步驟即可匹配最先進的文本到影像生成器的性能。此外，我們系統地研究了其擴展行為，並展示了 LADD 在各種應用中的有效性，如影像編輯和修補。

English

Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD. In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models. This approach simplifies training and enhances performance, enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. Moreover, we systematically investigate its scaling behavior and demonstrate LADD's effectiveness in various applications such as image editing and inpainting.

利用潛在對抗擴散蒸餾實現快速高解析度圖像合成。

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

摘要

Support