MAG-Edit: 마스크 기반 주의 조정 가이던스를 통한 복잡한 시나리오에서의 지역화된 이미지 편집

초록

최근 확산 기반 이미지 편집 접근법은 단순한 구성을 가진 이미지에서 인상적인 편집 능력을 보여주고 있다. 그러나 복잡한 시나리오에서의 지역적 편집은 실제 수요가 증가함에도 불구하고 문헌에서 충분히 연구되지 않았다. 기존의 마스크 기반 인페인팅 방법은 편집 영역 내의 기본 구조를 유지하는 데 한계가 있다. 한편, 마스크 없는 주의 기반 방법은 더 복잡한 구성에서 편집 누출과 정렬 오류를 보이는 경우가 많다. 본 연구에서는 복잡한 시나리오에서 지역적 이미지 편집을 가능하게 하는 학습이 필요 없는 추론 단계 최적화 방법인 MAG-Edit을 개발한다. 특히, MAG-Edit은 편집 토큰의 두 가지 마스크 기반 교차 주의 제약을 최대화함으로써 확산 모델의 잡음 잠재 특징을 최적화하며, 이를 통해 원하는 프롬프트와의 지역적 정렬을 점진적으로 강화한다. 광범위한 정량적 및 정성적 실험을 통해 복잡한 시나리오 내에서 지역적 편집을 위한 텍스트 정렬과 구조 보존을 동시에 달성하는 본 방법의 효과를 입증한다.

English

Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

MAG-Edit: 마스크 기반 주의 조정 가이던스를 통한 복잡한 시나리오에서의 지역화된 이미지 편집

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

초록

Support