区域E:面向高效图像编辑的自适应区域感知生成
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
October 29, 2025
作者: Pengtao Chen, Xianfang Zeng, Maosen Zhao, Mingzhu Shen, Peng Ye, Bangyin Xiang, Zhibo Wang, Wei Cheng, Gang Yu, Tao Chen
cs.AI
摘要
近期,基于指令的图像编辑技术受到广泛关注。实际应用中,该技术通常仅修改图像的特定区域,而其余区域基本保持不变。尽管这两类区域在生成难度和计算冗余度上存在显著差异,现有模型却未考虑这种区别,而是对整个图像采用统一的生成流程。为此,我们提出RegionE——一种自适应区域感知生成框架,无需额外训练即可加速图像编辑任务。该框架包含三个核心组件:1)自适应区域划分。我们发现未编辑区域的生成轨迹呈直线状,可通过单步推理预测多步去噪结果。因此在去噪早期阶段,我们根据最终预估结果与参考图像的差异,将图像划分为编辑区与未编辑区。2)区域感知生成。区分区域后,对未编辑区域用单步预测替代多步去噪;而编辑区域的轨迹呈曲线状,需进行局部迭代去噪。为提升局部迭代生成的效率与质量,我们提出区域指令KV缓存技术,在融入全局信息的同时降低计算成本。3)自适应速度衰减缓存。通过观察发现编辑区域相邻时间步存在强速度相关性,我们进一步提出自适应速度衰减缓存机制以加速局部去噪过程。我们将RegionE应用于Step1X-Edit、FLUX.1 Kontext和Qwen-Image-Edit等前沿基础模型,分别实现了2.57倍、2.41倍和2.06倍的加速效果。GPT-4o评估证实,该方法在保持语义一致性和视觉保真度方面表现优异。
English
Recently, instruction-based image editing (IIE) has received widespread
attention. In practice, IIE often modifies only specific regions of an image,
while the remaining areas largely remain unchanged. Although these two types of
regions differ significantly in generation difficulty and computational
redundancy, existing IIE models do not account for this distinction, instead
applying a uniform generation process across the entire image. This motivates
us to propose RegionE, an adaptive, region-aware generation framework that
accelerates IIE tasks without additional training. Specifically, the RegionE
framework consists of three main components: 1) Adaptive Region Partition. We
observed that the trajectory of unedited regions is straight, allowing for
multi-step denoised predictions to be inferred in a single step. Therefore, in
the early denoising stages, we partition the image into edited and unedited
regions based on the difference between the final estimated result and the
reference image. 2) Region-Aware Generation. After distinguishing the regions,
we replace multi-step denoising with one-step prediction for unedited areas.
For edited regions, the trajectory is curved, requiring local iterative
denoising. To improve the efficiency and quality of local iterative generation,
we propose the Region-Instruction KV Cache, which reduces computational cost
while incorporating global information. 3) Adaptive Velocity Decay Cache.
Observing that adjacent timesteps in edited regions exhibit strong velocity
similarity, we further propose an adaptive velocity decay cache to accelerate
the local denoising process. We applied RegionE to state-of-the-art IIE base
models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit. RegionE
achieved acceleration factors of 2.57, 2.41, and 2.06. Evaluations by GPT-4o
confirmed that semantic and perceptual fidelity were well preserved.