ChatPaper.aiChatPaper

FlashEdit:解耦速度、結構與語義,實現精確圖像編輯

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

September 26, 2025
作者: Junyi Wu, Zhiteng Li, Haotong Qin, Xiaohong Liu, Linghe Kong, Yulun Zhang, Xiaokang Yang
cs.AI

摘要

基於擴散模型的文本引導圖像編輯已取得顯著品質,但卻因過高的延遲而阻礙了其實際應用。我們提出了FlashEdit,這是一個旨在實現高保真、實時圖像編輯的新穎框架。其效率源自三大關鍵創新:(1) 一步式反轉與編輯(OSIE)流程,繞過了耗時的迭代過程;(2) 背景保護(BG-Shield)技術,通過僅在編輯區域內選擇性修改特徵,確保背景得以保留;(3) 稀疏化空間交叉注意力(SSCA)機制,通過抑制語義向背景的洩漏,確保精確、局部化的編輯。大量實驗表明,FlashEdit在保持優異的背景一致性和結構完整性的同時,能在0.2秒內完成編輯,相比於先前的多步方法,速度提升了超過150倍。我們的代碼將公開於https://github.com/JunyiWuCode/FlashEdit。
English
Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.
PDF34September 29, 2025