LIME:透過擴散模型中的注意力正則化進行局部圖像編輯
LIME: Localized Image Editing via Attention Regularization in Diffusion Models
December 14, 2023
作者: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
cs.AI
摘要
擴散模型(DMs)因其能夠生成高質量、多樣化圖像而日益受到重視,尤其是在最近的文本到圖像生成方面取得了進展。研究重點現在正轉向於 DMs 的可控性。在這個領域中的一個重要挑戰是局部編輯,即修改圖像的特定區域而不影響其餘內容。本文介紹了用於擴散模型中的局部圖像編輯的 LIME,它不需要用戶指定的感興趣區域(RoI)或額外的文本輸入。我們的方法利用預先訓練方法的特徵和簡單的聚類技術來獲得精確的語義分割地圖。然後,通過利用交叉注意力地圖,對這些區段進行細化以進行局部編輯。最後,我們提出了一種新穎的交叉注意力正則化技術,在去噪步驟中懲罰 RoI 中不相關的交叉注意力分數,確保局部編輯。我們的方法在不重新訓練和微調的情況下,在各種編輯基準測試中始終提高現有方法的性能。
English
Diffusion models (DMs) have gained prominence due to their ability to
generate high-quality, varied images, with recent advancements in text-to-image
generation. The research focus is now shifting towards the controllability of
DMs. A significant challenge within this domain is localized editing, where
specific areas of an image are modified without affecting the rest of the
content. This paper introduces LIME for localized image editing in diffusion
models that do not require user-specified regions of interest (RoI) or
additional text input. Our method employs features from pre-trained methods and
a simple clustering technique to obtain precise semantic segmentation maps.
Then, by leveraging cross-attention maps, it refines these segments for
localized edits. Finally, we propose a novel cross-attention regularization
technique that penalizes unrelated cross-attention scores in the RoI during the
denoising steps, ensuring localized edits. Our approach, without re-training
and fine-tuning, consistently improves the performance of existing methods in
various editing benchmarks.