MAG-Edit:透過基於遮罩的注意力調整引導在複雜情境中進行本地化圖像編輯
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
December 18, 2023
作者: Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou
cs.AI
摘要
最近基於擴散的圖像編輯方法展示出在結構簡單的圖像中具有令人印象深刻的編輯能力。然而,在複雜情境下的局部編輯在文獻中尚未得到充分研究,儘管現實世界對此需求日益增長。現有基於遮罩的修補方法無法保留編輯區域內的基本結構。同時,無遮罩的注意力機制方法在更複雜的構圖中常常出現編輯泄漏和不對齊的問題。在這項工作中,我們開發了MAG-Edit,一種無需訓練、推理階段優化方法,可實現在複雜情境中的局部圖像編輯。具體而言,MAG-Edit通過最大化編輯標記的兩個基於遮罩的交叉注意力約束來優化擴散模型中的噪聲潛在特徵,逐步增強與所需提示的局部對齊。大量定量和定性實驗證明了我們的方法在實現複雜情境中的局部編輯時,同時實現文本對齊和結構保留的有效性。
English
Recent diffusion-based image editing approaches have exhibited impressive
editing capabilities in images with simple compositions. However, localized
editing in complex scenarios has not been well-studied in the literature,
despite its growing real-world demands. Existing mask-based inpainting methods
fall short of retaining the underlying structure within the edit region.
Meanwhile, mask-free attention-based methods often exhibit editing leakage and
misalignment in more complex compositions. In this work, we develop
MAG-Edit, a training-free, inference-stage optimization method,
which enables localized image editing in complex scenarios. In particular,
MAG-Edit optimizes the noise latent feature in diffusion models by maximizing
two mask-based cross-attention constraints of the edit token, which in turn
gradually enhances the local alignment with the desired prompt. Extensive
quantitative and qualitative experiments demonstrate the effectiveness of our
method in achieving both text alignment and structure preservation for
localized editing within complex scenarios.