在大约7个步骤中进行文本引导的图像编辑的可逆一致性蒸馏
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
June 20, 2024
作者: Nikita Starodubcev, Mikhail Khoroshikh, Artem Babenko, Dmitry Baranchuk
cs.AI
摘要
扩散蒸馏代表了实现在少数采样步骤中实现忠实的文本到图像生成的一个极具前景的方向。然而,尽管最近取得了成功,现有的蒸馏模型仍未提供完整的扩散能力范围,比如实际图像反演,这使得许多精确的图像操作方法成为可能。本研究旨在丰富经过蒸馏的文本到图像扩散模型,使其能够有效地将真实图像编码到其潜在空间中。为此,我们引入了可逆一致性蒸馏(iCD),这是一个通用的一致性蒸馏框架,可以在仅3-4个推理步骤中促进高质量图像合成和准确图像编码。虽然文本到图像扩散模型的反演问题受到高无分类器引导尺度的加剧,但我们注意到动态引导显著减少了重建错误,而生成性能几乎没有下降。因此,我们证明了配备动态引导的iCD可能作为一种高效的零样本文本引导图像编辑工具,可以与更昂贵的最先进替代方案竞争。
English
Diffusion distillation represents a highly promising direction for achieving
faithful text-to-image generation in a few sampling steps. However, despite
recent successes, existing distilled models still do not provide the full
spectrum of diffusion abilities, such as real image inversion, which enables
many precise image manipulation methods. This work aims to enrich distilled
text-to-image diffusion models with the ability to effectively encode real
images into their latent space. To this end, we introduce invertible
Consistency Distillation (iCD), a generalized consistency distillation
framework that facilitates both high-quality image synthesis and accurate image
encoding in only 3-4 inference steps. Though the inversion problem for
text-to-image diffusion models gets exacerbated by high classifier-free
guidance scales, we notice that dynamic guidance significantly reduces
reconstruction errors without noticeable degradation in generation performance.
As a result, we demonstrate that iCD equipped with dynamic guidance may serve
as a highly effective tool for zero-shot text-guided image editing, competing
with more expensive state-of-the-art alternatives.Summary
AI-Generated Summary