FreeMorph：基於擴散模型的無需調參的通用圖像變形技術

摘要

我們提出了FreeMorph，這是首個無需調校即可處理不同語義或佈局輸入的圖像變形方法。與現有方法依賴於微調預訓練擴散模型並受時間限制及語義/佈局差異所限不同，FreeMorph無需針對每個實例進行訓練即可實現高保真度的圖像變形。儘管無需調校的方法因其效率和潛力而備受關注，但由於多步去噪過程的非線性特性以及預訓練擴散模型所繼承的偏差，這些方法在保持高質量結果方面面臨挑戰。本文中，我們引入FreeMorph，通過整合兩項關鍵創新來應對這些挑戰。1）我們首先提出了一種具備指導意識的球面插值設計，該設計通過修改自注意力模塊來融入輸入圖像的顯式指導，從而解決身份丟失問題並確保生成序列中的方向性過渡。2）我們進一步引入了一種面向步驟的變化趨勢，該趨勢融合了來自每個輸入圖像的自注意力模塊，以實現尊重兩個輸入的受控且一致的過渡。我們廣泛的評估表明，FreeMorph在性能上超越了現有方法，速度提升了10倍至50倍，並為圖像變形樹立了新的技術標杆。

English

We present FreeMorph, the first tuning-free method for image morphing that accommodates inputs with different semantics or layouts. Unlike existing methods that rely on finetuning pre-trained diffusion models and are limited by time constraints and semantic/layout discrepancies, FreeMorph delivers high-fidelity image morphing without requiring per-instance training. Despite their efficiency and potential, tuning-free methods face challenges in maintaining high-quality results due to the non-linear nature of the multi-step denoising process and biases inherited from the pre-trained diffusion model. In this paper, we introduce FreeMorph to address these challenges by integrating two key innovations. 1) We first propose a guidance-aware spherical interpolation design that incorporates explicit guidance from the input images by modifying the self-attention modules, thereby addressing identity loss and ensuring directional transitions throughout the generated sequence. 2) We further introduce a step-oriented variation trend that blends self-attention modules derived from each input image to achieve controlled and consistent transitions that respect both inputs. Our extensive evaluations demonstrate that FreeMorph outperforms existing methods, being 10x ~ 50x faster and establishing a new state-of-the-art for image morphing.

FreeMorph：基於擴散模型的無需調參的通用圖像變形技術

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

摘要

Support