LOCATEdit: 텍스트 기반 이미지 편집을 위한 그래프 라플라시안 최적화 교차 주의 메커니즘

초록

텍스트 기반 이미지 편집은 자연어 지시에 따라 이미지의 특정 영역을 수정하면서도 전체 구조와 배경의 충실도를 유지하는 것을 목표로 합니다. 기존 방법들은 확산 모델에서 생성된 교차 주의 맵에서 도출된 마스크를 사용하여 수정 대상 영역을 식별합니다. 그러나 교차 주의 메커니즘은 의미적 관련성에 초점을 맞추기 때문에 이미지의 무결성을 유지하는 데 어려움을 겪습니다. 결과적으로 이러한 방법들은 공간적 일관성이 부족하여 편집 아티팩트와 왜곡이 발생하는 경우가 많습니다. 본 연구에서는 이러한 한계를 해결하고, LOCATEdit을 소개합니다. LOCATEdit은 그래프 기반 접근 방식을 통해 자체 주의에서 도출된 패치 관계를 활용하여 교차 주의 맵을 개선함으로써 이미지 영역 전반에 걸쳐 부드럽고 일관된 주의를 유지합니다. 이를 통해 지정된 항목에만 변경이 제한되면서도 주변 구조를 보존할 수 있습니다. LOCATEdit은 PIE-Bench에서 기존 베이스라인을 일관되게 크게 능가하며, 다양한 편집 작업에서 최첨단 성능과 효과를 입증합니다. 코드는 https://github.com/LOCATEdit/LOCATEdit/에서 확인할 수 있습니다.

English

Text-guided image editing aims to modify specific regions of an image according to natural language instructions while maintaining the general structure and the background fidelity. Existing methods utilize masks derived from cross-attention maps generated from diffusion models to identify the target regions for modification. However, since cross-attention mechanisms focus on semantic relevance, they struggle to maintain the image integrity. As a result, these methods often lack spatial consistency, leading to editing artifacts and distortions. In this work, we address these limitations and introduce LOCATEdit, which enhances cross-attention maps through a graph-based approach utilizing self-attention-derived patch relationships to maintain smooth, coherent attention across image regions, ensuring that alterations are limited to the designated items while retaining the surrounding structure. \method consistently and substantially outperforms existing baselines on PIE-Bench, demonstrating its state-of-the-art performance and effectiveness on various editing tasks. Code can be found on https://github.com/LOCATEdit/LOCATEdit/

LOCATEdit: 텍스트 기반 이미지 편집을 위한 그래프 라플라시안 최적화 교차 주의 메커니즘

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

초록

Support