CannyEdit: トレーニング不要の画像編集のための選択的Canny制御とデュアルプロンプトガイダンス

要旨

テキストから画像（T2I）モデルの最近の進展により、基盤モデルの生成事前分布を活用したトレーニング不要の領域画像編集が可能となった。しかし、既存の手法では、編集領域におけるテキストの忠実度、未編集領域のコンテキストの忠実度、および編集のシームレスな統合のバランスを取ることが困難である。本論文では、これらの課題に対処する新しいトレーニング不要のフレームワークであるCannyEditを紹介する。CannyEditは、以下の2つの主要なイノベーションを通じてこれらの課題を解決する：(1) 選択的Canny制御（Selective Canny Control）は、ユーザー指定の編集可能領域においてCanny ControlNetの構造的ガイダンスをマスクしつつ、未編集領域の詳細を逆位相ControlNet情報保持によって厳密に保存する。これにより、コンテキストの整合性を損なうことなく、テキスト駆動の精密な編集が可能となる。(2) デュアルプロンプトガイダンス（Dual-Prompt Guidance）は、オブジェクト固有の編集のためのローカルプロンプトと、シーン内の相互作用を維持するためのグローバルターゲットプロンプトを組み合わせる。実世界の画像編集タスク（追加、置換、削除）において、CannyEditはKV-Editなどの従来手法を上回り、テキストの忠実度とコンテキストの忠実度のバランスにおいて2.93から10.49パーセントの改善を達成した。編集のシームレスさに関しては、ユーザー調査によると、編集のない実画像と組み合わせた場合、一般ユーザーの49.2パーセント、AIGC専門家の42.0パーセントのみがCannyEditの結果をAI編集と認識したのに対し、競合手法では76.08から89.09パーセントがAI編集と認識した。

English

Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving details of the source images in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93 to 10.49 percent improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2 percent of general users and 42.0 percent of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08 to 89.09 percent for competitor methods.

CannyEdit: トレーニング不要の画像編集のための選択的Canny制御とデュアルプロンプトガイダンス

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

要旨

Support