Magic Insert: スタイルを考慮したドラッグ＆ドロップ

要旨

本論文では、Magic Insertを提案する。これは、ユーザー提供の画像から被写体をドラッグ＆ドロップし、異なるスタイルのターゲット画像に物理的に妥当な方法で挿入しながら、ターゲット画像のスタイルに合わせる手法である。本研究では、スタイルを考慮したドラッグ＆ドロップの問題を定式化し、それを解決するための手法を提示する。具体的には、スタイルを考慮したパーソナライゼーションと、スタイル化された画像への現実的なオブジェクト挿入という2つのサブ問題に取り組む。スタイルを考慮したパーソナライゼーションでは、まずLoRAと学習済みテキストトークンを使用して、被写体画像に対して事前学習済みのテキストから画像への拡散モデルをファインチューニングし、その後、ターゲットスタイルのCLIP表現を組み込む。オブジェクト挿入では、ブートストラップドメイン適応を使用して、ドメイン固有のフォトリアリスティックなオブジェクト挿入モデルを多様な芸術的スタイルのドメインに適応させる。全体として、この手法はインペインティングなどの従来のアプローチを大幅に上回る性能を示す。最後に、この分野の評価と今後の進展を促進するためのデータセット、SubjectPlopを提示する。プロジェクトページ: https://magicinsert.github.io/

English

We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images. For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style. For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting. Finally, we present a dataset, SubjectPlop, to facilitate evaluation and future progress in this area. Project page: https://magicinsert.github.io/

Magic Insert: スタイルを考慮したドラッグ＆ドロップ

Magic Insert: Style-Aware Drag-and-Drop

要旨

Support