DiffMorpher:释放扩散模型在图像变形中的能力
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
December 12, 2023
作者: Kaiwen Zhang, Yifan Zhou, Xudong Xu, Xingang Pan, Bo Dai
cs.AI
摘要
扩散模型取得了卓越的图像生成质量,超越了先前的生成模型。然而,与生成对抗网络(GANs)相比,扩散模型的一个显著局限性在于它们在两个图像样本之间平滑插值的困难,这是由于其高度非结构化的潜在空间所致。这种平滑插值是引人入胜的,因为它自然地作为图像变形任务的解决方案,具有许多应用。在这项工作中,我们提出了DiffMorpher,这是第一个利用扩散模型实现平滑自然图像插值的方法。我们的关键思想是通过分别将两个LoRA拟合到两个图像中,捕捉这两个图像的语义,并在LoRA参数和潜在噪声之间进行插值,以确保平滑的语义过渡,从而自动出现对应关系,无需注释。此外,我们提出了一种注意力插值和注入技术以及一种新的采样计划,以进一步增强连续图像之间的平滑性。大量实验证明,DiffMorpher在各种对象类别上实现了明显更好的图像变形效果,弥合了扩散模型与GANs之间的一个关键功能差距。
English
Diffusion models have achieved remarkable image generation quality surpassing
previous generative models. However, a notable limitation of diffusion models,
in comparison to GANs, is their difficulty in smoothly interpolating between
two image samples, due to their highly unstructured latent space. Such a smooth
interpolation is intriguing as it naturally serves as a solution for the image
morphing task with many applications. In this work, we present DiffMorpher, the
first approach enabling smooth and natural image interpolation using diffusion
models. Our key idea is to capture the semantics of the two images by fitting
two LoRAs to them respectively, and interpolate between both the LoRA
parameters and the latent noises to ensure a smooth semantic transition, where
correspondence automatically emerges without the need for annotation. In
addition, we propose an attention interpolation and injection technique and a
new sampling schedule to further enhance the smoothness between consecutive
images. Extensive experiments demonstrate that DiffMorpher achieves starkly
better image morphing effects than previous methods across a variety of object
categories, bridging a critical functional gap that distinguished diffusion
models from GANs.