ChatPaper.aiChatPaper

TurboEdit:使用少步驟擴散模型進行基於文本的圖像編輯

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

August 1, 2024
作者: Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or
cs.AI

摘要

擴散模型為基於文字的圖像編輯框架開辟了道路。然而,這些框架通常基於擴散反向過程的多步性質,將其適應為簡化、快速取樣的方法卻顯得極具挑戰性。在這裡,我們專注於一系列熱門的基於文字的編輯框架 - “編輯友好” 的 DDPM-noise inversion 方法。我們分析其應用於快速取樣方法的情況,並將其失敗歸類為兩類:視覺異常的出現以及編輯強度不足。我們將這些異常追溯到反轉噪聲與預期噪聲時間表之間的統計不匹配,並建議一個調整的噪聲時間表來糾正此偏移。為增加編輯強度,我們提出了一種偽引導方法,有效地增加編輯的幅度而不引入新的異常。總的來說,我們的方法使基於文字的圖像編輯僅需三個擴散步驟即可實現,同時為熱門的基於文字的編輯方法背後的機制提供了新的見解。
English
Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We trace the artifacts to mismatched noise statistics between inverted noises and the expected noise schedule, and suggest a shifted noise schedule which corrects for this offset. To increase editing strength, we propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts. All in all, our method enables text-based image editing with as few as three diffusion steps, while providing novel insights into the mechanisms behind popular text-based editing approaches.

Summary

AI-Generated Summary

PDF172November 28, 2024