SpotEdit:扩散变换器中的选择性区域编辑
SpotEdit: Selective Region Editing in Diffusion Transformers
December 26, 2025
作者: Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, Xinchao Wang
cs.AI
摘要
扩散Transformer模型通过编码条件图像并将其整合至Transformer层,显著推进了图像编辑技术。然而,大多数编辑操作仅涉及小范围区域修改,而现有方法在每一步迭代中对所有令牌进行统一处理与去噪,这既导致冗余计算,又可能损害未改动区域的质量。由此引出一个根本性问题:在编辑过程中是否真的需要全图重建?为此,我们提出SpotEdit——一种免训练的扩散编辑框架,可选择性仅更新被修改区域。SpotEdit包含两大核心组件:SpotSelector通过感知相似性识别稳定区域,并复用条件图像特征跳过其计算;SpotFusion通过动态融合机制自适应地将这些特征与编辑后令牌混合,保持上下文连贯性与编辑质量。通过减少不必要计算并维持未改动区域的高保真度,SpotEdit实现了高效精准的图像编辑。
English
Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.