FlashEdit: 速度、構造、意味を分離した精密な画像編集

要旨

テキストガイドによる画像編集は、拡散モデルを用いることで高い品質を達成しているが、現実世界での応用を妨げるほどの遅延が問題となっている。本論文では、高忠実度かつリアルタイムの画像編集を可能にする新しいフレームワーク「FlashEdit」を提案する。その効率性は、以下の3つの主要な革新に由来する：(1) 高コストな反復プロセスを回避する「ワンステップ逆変換・編集（OSIE）」パイプライン、(2) 編集領域内の特徴のみを選択的に変更することで背景の保存を保証する「背景保護（BG-Shield）」技術、(3) 背景への意味的な漏れを抑制することで正確で局所的な編集を実現する「疎化空間的クロスアテンション（SSCA）」メカニズムである。大規模な実験により、FlashEditは優れた背景の一貫性と構造的整合性を維持しつつ、従来の多段階手法と比較して150倍以上の高速化を実現し、0.2秒未満で編集を実行することが示された。本コードはhttps://github.com/JunyiWuCode/FlashEditで公開予定である。

English

Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.

FlashEdit: 速度、構造、意味を分離した精密な画像編集

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

要旨

Support