ChatPaper.aiChatPaper

具有潜在对抗扩散蒸馏的快速高分辨率图像合成

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

March 18, 2024
作者: Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach
cs.AI

摘要

扩散模型是图像和视频合成领域进展的主要推动力,但存在推断速度较慢的问题。蒸馏方法,如最近引入的对抗性扩散蒸馏(ADD),旨在将模型从多步推断转变为单步推断,尽管这会导致昂贵且难以优化,因为它依赖于一个固定的预训练的DINOv2鉴别器。我们引入了潜在对抗性扩散蒸馏(LADD),这是一种新颖的蒸馏方法,克服了ADD的局限性。与基于像素的ADD相比,LADD利用了预训练潜在扩散模型的生成特征。这种方法简化了训练过程并增强了性能,实现了高分辨率多方位比例图像合成。我们将LADD应用于稳定扩散3(8B)中,得到了SD3-Turbo,这是一个快速模型,仅使用四个无引导采样步骤即可匹配最先进的文本到图像生成器的性能。此外,我们系统地研究了其扩展行为,并展示了LADD在诸如图像编辑和修复等各种应用中的有效性。
English
Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD. In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models. This approach simplifies training and enhances performance, enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. Moreover, we systematically investigate its scaling behavior and demonstrate LADD's effectiveness in various applications such as image editing and inpainting.

Summary

AI-Generated Summary

PDF682December 15, 2024