SpotEdit:擴散轉換器中的選擇性區域編輯
SpotEdit: Selective Region Editing in Diffusion Transformers
December 26, 2025
作者: Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, Xinchao Wang
cs.AI
摘要
擴散轉換器模型通過對條件圖像進行編碼並將其整合至轉換器層,顯著推進了圖像編輯技術的發展。然而,當前多數編輯操作僅涉及小範圍區域的修改,而現有方法卻在每個時間步長均勻處理並對所有標記進行去噪,這不僅導致冗餘計算,更可能使未改動區域的畫質劣化。此現象引發了一個根本性問題:在編輯過程中是否真有必要對每個區域進行重新生成?為解決此問題,我們提出SpotEdit——一種免訓練的擴散編輯框架,可選擇性地僅更新被修改區域。SpotEdit包含兩個核心組件:SpotSelector通過感知相似度識別穩定區域,並藉由重用條件圖像特徵跳過其計算;SpotFusion則通過動態融合機制,自適應地將這些特徵與編輯後標記進行混合,從而保持上下文連貫性與編輯品質。通過減少不必要的計算並維持未改動區域的高保真度,SpotEdit實現了高效且精準的圖像編輯。
English
Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.