MAG-Edit：通过基于蒙版的注意力调整引导在复杂场景中进行本地化图像编辑

摘要

最近基于扩散的图像编辑方法在简单构图的图像中展现出令人印象深刻的编辑能力。然而，尽管现实世界对于复杂情景下的局部编辑需求不断增长，文献中对于这方面的研究却不够充分。现有基于蒙版的修补方法未能保留编辑区域内的基本结构。与此同时，基于无蒙版的注意力机制方法在更复杂的构图中常常出现编辑泄漏和错位的问题。在本研究中，我们开发了MAG-Edit，这是一种无需训练、在推理阶段进行优化的方法，可以实现复杂情景下的局部图像编辑。具体而言，MAG-Edit通过最大化编辑令牌的两个基于蒙版的交叉注意力约束来优化扩散模型中的噪声潜在特征，从而逐渐增强与所需提示的局部对齐。大量定量和定性实验表明，我们的方法在实现复杂情景下的局部编辑中既能够实现文本对齐，又能够保留结构。

English

Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

MAG-Edit：通过基于蒙版的注意力调整引导在复杂场景中进行本地化图像编辑

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

摘要

Support