FlashEdit: Disaccoppiamento di Velocità, Struttura e Semantica per un Editing Preciso delle Immagini

Abstract

La modifica guidata da testo delle immagini con modelli di diffusione ha raggiunto una qualità straordinaria, ma soffre di una latenza proibitiva, ostacolando le applicazioni nel mondo reale. Introduciamo FlashEdit, un nuovo framework progettato per abilitare la modifica delle immagini in tempo reale con alta fedeltà. La sua efficienza deriva da tre innovazioni chiave: (1) una pipeline One-Step Inversion-and-Editing (OSIE) che bypassa i costosi processi iterativi; (2) una tecnica Background Shield (BG-Shield) che garantisce la preservazione dello sfondo modificando selettivamente le caratteristiche solo all'interno della regione di modifica; e (3) un meccanismo Sparsified Spatial Cross-Attention (SSCA) che assicura modifiche precise e localizzate sopprimendo la dispersione semantica verso lo sfondo. Esperimenti estensivi dimostrano che FlashEdit mantiene una superiore coerenza dello sfondo e integrità strutturale, eseguendo modifiche in meno di 0,2 secondi, ottenendo un'accelerazione di oltre 150 volte rispetto ai precedenti metodi multi-step. Il nostro codice sarà reso disponibile pubblicamente all'indirizzo https://github.com/JunyiWuCode/FlashEdit.

English

Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.

FlashEdit: Disaccoppiamento di Velocità, Struttura e Semantica per un Editing Preciso delle Immagini

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

Abstract

Support