TurboEdit:使用少步扩散模型进行基于文本的图像编辑
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
August 1, 2024
作者: Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or
cs.AI
摘要
扩散模型为基于文本的图像编辑框架开辟了道路。然而,这些框架通常建立在扩散反向过程的多步特性上,将其调整为精炼、快速采样的方法却证明是相当具有挑战性的。在这里,我们专注于一种流行的基于文本的编辑框架 - “编辑友好型”DDPM-噪声反演方法。我们分析了其在快速采样方法中的应用,并将其失败归类为两类:视觉伪影的出现和编辑强度不足。我们将这些伪影追溯到反演噪声与预期噪声时间表之间的噪声统计不匹配,并提出了一种校正这种偏移的偏移噪声时间表。为增强编辑强度,我们提出了一种伪引导方法,有效地增加编辑的幅度而不引入新的伪影。总的来说,我们的方法使得基于文本的图像编辑仅需三个扩散步骤,同时为流行的基于文本的编辑方法背后的机制提供了新颖的见解。
English
Diffusion models have opened the path to a wide range of text-based image
editing frameworks. However, these typically build on the multi-step nature of
the diffusion backwards process, and adapting them to distilled, fast-sampling
methods has proven surprisingly challenging. Here, we focus on a popular line
of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion
approach. We analyze its application to fast sampling methods and categorize
its failures into two classes: the appearance of visual artifacts, and
insufficient editing strength. We trace the artifacts to mismatched noise
statistics between inverted noises and the expected noise schedule, and suggest
a shifted noise schedule which corrects for this offset. To increase editing
strength, we propose a pseudo-guidance approach that efficiently increases the
magnitude of edits without introducing new artifacts. All in all, our method
enables text-based image editing with as few as three diffusion steps, while
providing novel insights into the mechanisms behind popular text-based editing
approaches.Summary
AI-Generated Summary