ChatPaper.aiChatPaper

CannyEdit:选择性Canny控制与双提示引导的无训练图像编辑

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

August 9, 2025
作者: Weiyan Xie, Han Gao, Didan Deng, Kaican Li, April Hua Liu, Yongxiang Huang, Nevin L. Zhang
cs.AI

摘要

近期,文本到图像(T2I)模型的进展使得无需额外训练即可实现基于基础模型生成先验的区域图像编辑。然而,现有方法在平衡编辑区域的文本遵循性、未编辑区域的上下文保真度以及编辑的无缝融合方面存在困难。我们提出了CannyEdit,一种新颖的无训练框架,通过两项关键创新应对这些挑战:(1)选择性Canny控制,该技术在用户指定的可编辑区域屏蔽Canny ControlNet的结构引导,同时通过反相阶段的ControlNet信息保留严格保护源图像在未编辑区域的细节。这实现了精确的文本驱动编辑,且不损害上下文完整性。(2)双提示引导,结合用于对象特定编辑的局部提示与全局目标提示,以维持场景交互的连贯性。在现实世界的图像编辑任务(添加、替换、移除)中,CannyEdit超越了KV-Edit等先前方法,在文本遵循性与上下文保真度的平衡上提升了2.93%至10.49%。就编辑的无缝性而言,用户研究表明,当与未编辑的真实图像配对时,仅有49.2%的普通用户和42.0%的AIGC专家识别出CannyEdit的结果为AI编辑,而竞争对手方法的识别率则高达76.08%至89.09%。
English
Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving details of the source images in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93 to 10.49 percent improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2 percent of general users and 42.0 percent of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08 to 89.09 percent for competitor methods.
PDF35August 14, 2025