LIME:扩散模型中基于注意力规范的局部图像编辑
LIME: Localized Image Editing via Attention Regularization in Diffusion Models
December 14, 2023
作者: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
cs.AI
摘要
扩散模型(DMs)因其能够生成高质量、多样化图像而备受关注,尤其是在最近的文本到图像生成方面取得了重大进展。研究重点现在转向DMs的可控性。该领域内一个重要挑战是局部编辑,即修改图像特定区域而不影响其余内容。本文介绍了一种用于扩散模型中局部图像编辑的LIME方法,无需用户指定感兴趣区域(RoI)或额外文本输入。我们的方法利用预训练方法的特征和简单的聚类技术获取精确的语义分割图。然后,通过利用交叉注意力图,对这些段进行细化以进行局部编辑。最后,我们提出了一种新颖的交叉注意力正则化技术,在去噪步骤中惩罚RoI中不相关的交叉注意力分数,确保局部编辑。我们的方法在不重新训练和微调的情况下,在各种编辑基准测试中始终提高了现有方法的性能。
English
Diffusion models (DMs) have gained prominence due to their ability to
generate high-quality, varied images, with recent advancements in text-to-image
generation. The research focus is now shifting towards the controllability of
DMs. A significant challenge within this domain is localized editing, where
specific areas of an image are modified without affecting the rest of the
content. This paper introduces LIME for localized image editing in diffusion
models that do not require user-specified regions of interest (RoI) or
additional text input. Our method employs features from pre-trained methods and
a simple clustering technique to obtain precise semantic segmentation maps.
Then, by leveraging cross-attention maps, it refines these segments for
localized edits. Finally, we propose a novel cross-attention regularization
technique that penalizes unrelated cross-attention scores in the RoI during the
denoising steps, ensuring localized edits. Our approach, without re-training
and fine-tuning, consistently improves the performance of existing methods in
various editing benchmarks.