単一画像の連続的レイアウト編集を拡散モデルで実現

要旨

大規模なテキストから画像への拡散モデルの最近の進展により、画像編集における多くの応用が可能となってきました。しかし、これらの手法のいずれも、既存の単一画像のレイアウトを編集することはできませんでした。このギャップを埋めるため、我々は単一画像のレイアウトを編集しつつその視覚的特性を保持する初めてのフレームワークを提案します。これにより、単一画像上での連続的な編集が可能となります。我々のアプローチは、2つの主要なモジュールによって実現されます。まず、画像内の複数のオブジェクトの特性を保持するために、異なるオブジェクトの概念を分離し、それらを「マスク付きテキスト逆変換」と呼ばれる新規手法を用いて別々のテキストトークンに埋め込みます。次に、事前学習済みの拡散モデルに対してレイアウト制御を行うためのトレーニング不要の最適化手法を提案します。これにより、学習済みの概念を用いて画像を再生成し、ユーザー指定のレイアウトに合わせることが可能となります。既存画像のレイアウトを編集する初めてのフレームワークとして、我々の手法が有効であり、このタスクをサポートするために修正された他のベースラインを上回ることを実証します。我々のコードは、受理後、自由に公開されます。

English

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

単一画像の連続的レイアウト編集を拡散モデルで実現

Continuous Layout Editing of Single Images with Diffusion Models

要旨

Support