DiffMorpher: 拡散モデルによる画像モーフィングの可能性を解き放つ

要旨

拡散モデルは、従来の生成モデルを凌駕する驚異的な画像生成品質を実現しています。しかし、GANと比較した際の拡散モデルの顕著な制限として、高度に非構造化された潜在空間のため、2つの画像サンプル間を滑らかに補間することが困難である点が挙げられます。このような滑らかな補間は、多くの応用が可能な画像モーフィングタスクの自然な解決策として興味深いものです。本研究では、拡散モデルを用いて滑らかで自然な画像補間を可能にする初のアプローチであるDiffMorpherを提案します。私たちの鍵となるアイデアは、2つの画像のセマンティクスをそれぞれLoRAで捉え、LoRAパラメータと潜在ノイズの両方を補間することで滑らかな意味的遷移を確保し、アノテーションを必要とせずに対応関係が自然に生まれるようにすることです。さらに、連続する画像間の滑らかさをさらに向上させるために、アテンション補間と注入技術、および新しいサンプリングスケジュールを提案します。広範な実験により、DiffMorpherが様々なオブジェクトカテゴリにおいて、従来の手法よりもはるかに優れた画像モーフィング効果を達成し、拡散モデルとGANを区別していた重要な機能的なギャップを埋めることが実証されました。

English

Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphing task with many applications. In this work, we present DiffMorpher, the first approach enabling smooth and natural image interpolation using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition, where correspondence automatically emerges without the need for annotation. In addition, we propose an attention interpolation and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images. Extensive experiments demonstrate that DiffMorpher achieves starkly better image morphing effects than previous methods across a variety of object categories, bridging a critical functional gap that distinguished diffusion models from GANs.

DiffMorpher: 拡散モデルによる画像モーフィングの可能性を解き放つ

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

要旨

Support