ChatPaper.aiChatPaper

FlashEdit:解耦速度、结构与语义,实现精准图像编辑

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

September 26, 2025
作者: Junyi Wu, Zhiteng Li, Haotong Qin, Xiaohong Liu, Linghe Kong, Yulun Zhang, Xiaokang Yang
cs.AI

摘要

基于扩散模型的文本引导图像编辑技术虽已取得显著质量提升,但其高昂的延迟阻碍了实际应用。我们推出FlashEdit,这一创新框架旨在实现高保真、实时的图像编辑。其高效性源于三大关键创新:(1) 一步式反演与编辑(OSIE)流程,绕过了耗时的迭代过程;(2) 背景保护(BG-Shield)技术,通过仅在编辑区域内选择性修改特征,确保背景不变;(3) 稀疏化空间交叉注意力(SSCA)机制,通过抑制语义向背景的泄露,保证精确、局部的编辑。大量实验表明,FlashEdit在保持卓越背景一致性和结构完整性的同时,能在0.2秒内完成编辑,相比之前的多步方法实现了超过150倍的加速。我们的代码将公开于https://github.com/JunyiWuCode/FlashEdit。
English
Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.
PDF34September 29, 2025