CannyEdit: Controllo Selettivo Canny e Guida a Doppio Prompt per l'Editing di Immagini Senza Addestramento

Abstract

I recenti progressi nei modelli text-to-image (T2I) hanno reso possibile l'editing regionale delle immagini senza necessità di addestramento, sfruttando i prior generativi dei modelli di base. Tuttavia, i metodi esistenti faticano a bilanciare l'aderenza al testo nelle regioni modificate, la fedeltà contestuale nelle aree non modificate e l'integrazione senza soluzione di continuità delle modifiche. Introduciamo CannyEdit, un nuovo framework senza addestramento che affronta queste sfide attraverso due innovazioni chiave: (1) Selective Canny Control, che maschera la guida strutturale di Canny ControlNet nelle regioni modificabili specificate dall'utente, preservando rigorosamente i dettagli delle immagini sorgente nelle aree non modificate tramite la ritenzione delle informazioni di ControlNet nella fase di inversione. Ciò consente modifiche precise guidate dal testo senza compromettere l'integrità contestuale. (2) Dual-Prompt Guidance, che combina prompt locali per modifiche specifiche agli oggetti con un prompt target globale per mantenere interazioni coerenti nella scena. Su compiti di editing di immagini del mondo reale (aggiunta, sostituzione, rimozione), CannyEdit supera i metodi precedenti come KV-Edit, ottenendo un miglioramento dal 2,93 al 10,49 percento nel bilanciamento tra aderenza al testo e fedeltà contestuale. In termini di seamless editing, studi sugli utenti rivelano che solo il 49,2 percento degli utenti generici e il 42,0 percento degli esperti di AIGC hanno identificato i risultati di CannyEdit come modificati dall'IA quando accostati a immagini reali senza modifiche, rispetto al 76,08-89,09 percento per i metodi concorrenti.

English

Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving details of the source images in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93 to 10.49 percent improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2 percent of general users and 42.0 percent of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08 to 89.09 percent for competitor methods.

CannyEdit: Controllo Selettivo Canny e Guida a Doppio Prompt per l'Editing di Immagini Senza Addestramento

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Abstract

Support