LEDITS：基于DDPM反演与语义引导的真实图像编辑

摘要

近期的大规模文本引导扩散模型展现出强大的图像生成能力。当前研究重点在于仅通过文本实现图像修改，以提供直观且灵活的编辑方式。然而，由于编辑技术需保留原始图像特定内容的内在特性，这类生成模型的编辑任务面临挑战。反观文本驱动模型，即使对提示词进行微小调整也常导致生成结果截然不同，这使得精准实现符合用户意图的一次性生成变得极为困难。此外，要利用这些前沿工具编辑真实图像，需先将图像反演至预训练模型的域空间——这不仅会影响编辑质量，还会增加处理延迟。在本探索性报告中，我们提出LEDITS：一种结合轻量级真实图像编辑的方法，通过将"编辑友好型DDPM反演"技术与语义引导相结合，将语义引导扩展至真实图像编辑领域，同时充分发挥DDPM反演的编辑能力。该方法无需优化或扩展模型架构，即可实现从细微调整到大幅改动，乃至构图与风格变化的多样化编辑效果。

English

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

LEDITS：基于DDPM反演与语义引导的真实图像编辑

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

摘要

Support