ChatPaper.aiChatPaper

**ProEdit:基于提示词反演的精准编辑新方法**

ProEdit: Inversion-based Editing From Prompts Done Right

December 26, 2025
作者: Zhi Ouyang, Dian Zheng, Xiao-Ming Wu, Jian-Jian Jiang, Kun-Yu Lin, Jingke Meng, Wei-Shi Zheng
cs.AI

摘要

基于反转的视觉编辑技术提供了一种无需训练即可根据用户指令编辑图像或视频的有效方法。现有方法通常在采样过程中注入源图像信息以保持编辑一致性,但该采样策略过度依赖源信息,会对目标图像的编辑产生负面影响(例如无法按指令改变主体的姿态、数量或颜色等属性)。本研究提出ProEdit方法,从注意力机制和潜在空间两个维度解决这一问题。在注意力层面,我们引入KV混合机制,通过融合源图像与目标图像在编辑区域的键值特征,在保持背景一致性的同时削弱源图像对编辑区域的影响。在潜在空间层面,我们提出潜在偏移技术,通过扰动源潜在向量的编辑区域来消除反转潜在向量对采样的影响。在多个图像与视频编辑基准测试上的大量实验表明,我们的方法实现了最先进的性能。此外,我们的设计具有即插即用特性,可无缝集成到现有反转与编辑方法(如RF-Solver、FireFlow和UniEdit)中。
English
Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information during the sampling process to maintain editing consistency. However, this sampling strategy overly relies on source information, which negatively affects the edits in the target image (e.g., failing to change the subject's atributes like pose, number, or color as instructed). In this work, we propose ProEdit to address this issue both in the attention and the latent aspects. In the attention aspect, we introduce KV-mix, which mixes KV features of the source and the target in the edited region, mitigating the influence of the source image on the editing region while maintaining background consistency. In the latent aspect, we propose Latents-Shift, which perturbs the edited region of the source latent, eliminating the influence of the inverted latent on the sampling. Extensive experiments on several image and video editing benchmarks demonstrate that our method achieves SOTA performance. In addition, our design is plug-and-play, which can be seamlessly integrated into existing inversion and editing methods, such as RF-Solver, FireFlow and UniEdit.
PDF121December 30, 2025