ChatPaper.aiChatPaper

TurboEdit:即時文字型圖像編輯

TurboEdit: Instant text-based image editing

August 14, 2024
作者: Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman
cs.AI

摘要

在少步驟擴散模型的背景下,我們解決了精確圖像反演和解耦圖像編輯的挑戰。我們引入了基於編碼器的迭代反演技術。反演網絡是根據輸入圖像和前一步重建圖像條件化的,從而使下一個重建朝向輸入圖像進行校正。我們展示了在少步擴散模型中,通過條件化於(自動生成的)詳細文本提示,可以輕鬆實現解耦控制。為了操縱反轉圖像,我們凍結噪聲地圖並修改文本提示中的一個屬性(可以手動或通過基於LLM驅動的指令編輯),從而生成一幅新圖像,與輸入圖像相似,只有一個屬性發生變化。它可以進一步控制編輯強度並接受指導性文本提示。我們的方法實現了實時逼真的文本引導圖像編輯,僅需要8次反演中的功能評估(一次性成本)和每次編輯需要4次功能評估。我們的方法不僅速度快,而且在多步擴散編輯技術方面表現顯著優越。
English
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

Summary

AI-Generated Summary

PDF213November 26, 2024