LIME: 拡散モデルにおける注意正則化を用いた局所的な画像編集

要旨

拡散モデル（DMs）は、高品質で多様な画像を生成する能力により注目を集めており、特に最近のテキストから画像への生成技術の進展がその背景にある。現在の研究焦点は、DMsの制御可能性に向かってシフトしている。この領域における重要な課題の一つは、画像の特定の領域を編集しつつ、他の部分に影響を与えない局所的な編集である。本論文では、ユーザー指定の関心領域（RoI）や追加のテキスト入力を必要としない、拡散モデルにおける局所的な画像編集手法「LIME」を提案する。我々の手法は、事前学習済みの手法から得られた特徴と単純なクラスタリング技術を用いて、精密な意味的分割マップを取得する。次に、クロスアテンションマップを活用して、これらのセグメントを局所的な編集のために洗練する。最後に、ノイズ除去ステップ中にRoI内の無関係なクロスアテンションスコアをペナルティ化する新たなクロスアテンション正則化技術を提案し、局所的な編集を保証する。我々のアプローチは、再学習やファインチューニングを必要とせず、様々な編集ベンチマークにおいて既存手法の性能を一貫して向上させる。

English

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper introduces LIME for localized image editing in diffusion models that do not require user-specified regions of interest (RoI) or additional text input. Our method employs features from pre-trained methods and a simple clustering technique to obtain precise semantic segmentation maps. Then, by leveraging cross-attention maps, it refines these segments for localized edits. Finally, we propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits. Our approach, without re-training and fine-tuning, consistently improves the performance of existing methods in various editing benchmarks.

LIME: 拡散モデルにおける注意正則化を用いた局所的な画像編集

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

要旨

Support