FlashEdit: 정밀한 이미지 편집을 위한 속도, 구조, 의미의 분리

초록

텍스트 기반 이미지 편집은 확산 모델을 통해 놀라운 품질을 달성했지만, 과도한 지연 시간으로 인해 실세계 응용에 장애가 되고 있습니다. 우리는 고품질의 실시간 이미지 편집을 가능하게 하는 새로운 프레임워크인 FlashEdit을 소개합니다. FlashEdit의 효율성은 세 가지 주요 혁신에서 비롯됩니다: (1) 비용이 많이 드는 반복적 프로세스를 우회하는 One-Step Inversion-and-Editing (OSIE) 파이프라인; (2) 편집 영역 내에서만 특징을 선택적으로 수정함으로써 배경 보존을 보장하는 Background Shield (BG-Shield) 기술; (3) 배경으로의 의미적 누출을 억제하여 정확하고 지역화된 편집을 보장하는 Sparsified Spatial Cross-Attention (SSCA) 메커니즘. 광범위한 실험을 통해 FlashEdit이 우수한 배경 일관성과 구조적 무결성을 유지하면서 0.2초 이내에 편집을 수행함을 입증했습니다. 이는 기존의 다단계 방법에 비해 150배 이상의 속도 향상을 나타냅니다. 우리의 코드는 https://github.com/JunyiWuCode/FlashEdit에서 공개될 예정입니다.

English

Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.

FlashEdit: 정밀한 이미지 편집을 위한 속도, 구조, 의미의 분리

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

초록

Support