FreeMorph：基于扩散模型的无调优通用图像变形技术

摘要

我们提出了FreeMorph，这是首个无需调优即可处理不同语义或布局输入图像的变形方法。与现有方法依赖微调预训练扩散模型并受限于时间约束及语义/布局差异不同，FreeMorph无需针对每个实例进行训练，便能实现高保真图像变形。尽管无需调优的方法因其高效性和潜力而备受关注，但由于多步去噪过程的非线性特性以及预训练扩散模型所继承的偏差，它们在保持高质量结果方面面临挑战。本文中，我们引入FreeMorph，通过整合两项关键创新来应对这些挑战。1) 我们首先提出了一种指导感知的球面插值设计，通过修改自注意力模块，融入输入图像的显式指导，从而解决身份丢失问题，并确保生成序列中的方向性过渡。2) 我们进一步引入了一种面向步骤的变化趋势，融合源自每个输入图像的自注意力模块，以实现尊重双方输入的可控且一致的过渡。广泛的评估表明，FreeMorph在性能上超越现有方法，速度提升10至50倍，为图像变形领域树立了新的标杆。

English

We present FreeMorph, the first tuning-free method for image morphing that accommodates inputs with different semantics or layouts. Unlike existing methods that rely on finetuning pre-trained diffusion models and are limited by time constraints and semantic/layout discrepancies, FreeMorph delivers high-fidelity image morphing without requiring per-instance training. Despite their efficiency and potential, tuning-free methods face challenges in maintaining high-quality results due to the non-linear nature of the multi-step denoising process and biases inherited from the pre-trained diffusion model. In this paper, we introduce FreeMorph to address these challenges by integrating two key innovations. 1) We first propose a guidance-aware spherical interpolation design that incorporates explicit guidance from the input images by modifying the self-attention modules, thereby addressing identity loss and ensuring directional transitions throughout the generated sequence. 2) We further introduce a step-oriented variation trend that blends self-attention modules derived from each input image to achieve controlled and consistent transitions that respect both inputs. Our extensive evaluations demonstrate that FreeMorph outperforms existing methods, being 10x ~ 50x faster and establishing a new state-of-the-art for image morphing.

FreeMorph：基于扩散模型的无调优通用图像变形技术

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

摘要

Support