DiffMorpher:釋放擴散模型在圖像變形中的能力
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
December 12, 2023
作者: Kaiwen Zhang, Yifan Zhou, Xudong Xu, Xingang Pan, Bo Dai
cs.AI
摘要
擴散模型已經取得了卓越的圖像生成質量,超越了先前的生成模型。然而,與生成對抗網絡(GANs)相比,擴散模型的一個顯著限制在於它們在兩個圖像樣本之間平滑插值的困難,這是因為它們高度非結構化的潛在空間。這種平滑插值是引人入勝的,因為它自然地作為圖像變形任務的解決方案,具有許多應用。在這項工作中,我們提出了DiffMorpher,這是第一種利用擴散模型實現平滑自然圖像插值的方法。我們的關鍵想法是通過分別對兩個圖像擬合兩個 LoRA,來捕捉這兩個圖像的語義,並在 LoRA 參數和潛在噪聲之間進行插值,以確保平滑的語義過渡,其中對應自動出現,無需注釋。此外,我們提出了一種注意力插值和注入技術,以及一種新的採樣時間表,進一步增強連續圖像之間的平滑度。大量實驗表明,DiffMorpher在各種物體類別上實現了比先前方法更好的圖像變形效果,彌合了擴散模型與 GANs 之間的一個關鍵功能差距。
English
Diffusion models have achieved remarkable image generation quality surpassing
previous generative models. However, a notable limitation of diffusion models,
in comparison to GANs, is their difficulty in smoothly interpolating between
two image samples, due to their highly unstructured latent space. Such a smooth
interpolation is intriguing as it naturally serves as a solution for the image
morphing task with many applications. In this work, we present DiffMorpher, the
first approach enabling smooth and natural image interpolation using diffusion
models. Our key idea is to capture the semantics of the two images by fitting
two LoRAs to them respectively, and interpolate between both the LoRA
parameters and the latent noises to ensure a smooth semantic transition, where
correspondence automatically emerges without the need for annotation. In
addition, we propose an attention interpolation and injection technique and a
new sampling schedule to further enhance the smoothness between consecutive
images. Extensive experiments demonstrate that DiffMorpher achieves starkly
better image morphing effects than previous methods across a variety of object
categories, bridging a critical functional gap that distinguished diffusion
models from GANs.