LEDITS：結合DDPM反演與語意引導的真實影像編輯技術

摘要

近期大規模文字引導擴散模型展現了強大的圖像生成能力。當前研究重點在於僅透過文字指令實現圖像修改，以提供直覺且多功能的編輯方式。然而，由於編輯技術需保留原始圖像特定內容的本質特性，這類生成模型的編輯任務面臨挑戰。相反地，在文字驅動模型中，即使對提示詞進行細微修改，也常導致生成結果截然不同，使得精準符合用戶意圖的一次性生成極難實現。此外，要使用這些尖端工具編輯真實圖像，必須先將圖像反演至預訓練模型的領域——此過程不僅影響編輯品質，更會增加延遲成本。本探索性報告提出LEDITS輕量級整合方案，透過結合Edit Friendly DDPM反演技術與語義引導，將語義引導擴展至真實圖像編輯領域，同時發揮DDPM反演的編輯優勢。該方法無需優化或擴展模型架構，即可實現從細微調整到大幅改動的多樣化編輯，包括構圖與風格的轉換。

English

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

LEDITS：結合DDPM反演與語意引導的真實影像編輯技術

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

摘要

Support