平滑扩散:在扩散模型中打造平滑的潜空间
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
December 7, 2023
作者: Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi
cs.AI
摘要
最近,扩散模型在文本到图像(T2I)生成领域取得了显著进展,能够合成具有高保真度和多样内容的图像。尽管取得了这一进展,扩散模型内部的潜在空间平滑性仍然很少被探索。平滑的潜在空间确保对输入潜在的微小扰动会对应于输出图像的稳定变化。这种特性在包括图像插值、反演和编辑在内的下游任务中非常有益。在这项工作中,我们通过观察到由于微小潜在变化而导致的明显视觉波动,揭示了扩散潜在空间的非平滑性。为了解决这个问题,我们提出了平滑扩散,这是一类新型的扩散模型,既能高效执行又能保持平滑。具体来说,我们引入了逐步变化正则化,以强制执行任意输入潜在的变化与输出图像的变化之间的比例在任何扩散训练步骤中保持恒定。此外,我们设计了一个插值标准差(ISTD)度量标准,有效评估扩散模型潜在空间的平滑性。广泛的定量和定性实验表明,平滑扩散不仅在T2I生成中表现出色,而且在各种下游任务中也是更理想的解决方案。平滑扩散被实现为一个即插即用的Smooth-LoRA,可与各种社区模型配合使用。代码可在 https://github.com/SHI-Labs/Smooth-Diffusion 获取。
English
Recently, diffusion models have made remarkable progress in text-to-image
(T2I) generation, synthesizing images with high fidelity and diverse contents.
Despite this advancement, latent space smoothness within diffusion models
remains largely unexplored. Smooth latent spaces ensure that a perturbation on
an input latent corresponds to a steady change in the output image. This
property proves beneficial in downstream tasks, including image interpolation,
inversion, and editing. In this work, we expose the non-smoothness of diffusion
latent spaces by observing noticeable visual fluctuations resulting from minor
latent variations. To tackle this issue, we propose Smooth Diffusion, a new
category of diffusion models that can be simultaneously high-performing and
smooth. Specifically, we introduce Step-wise Variation Regularization to
enforce the proportion between the variations of an arbitrary input latent and
that of the output image is a constant at any diffusion training step. In
addition, we devise an interpolation standard deviation (ISTD) metric to
effectively assess the latent space smoothness of a diffusion model. Extensive
quantitative and qualitative experiments demonstrate that Smooth Diffusion
stands out as a more desirable solution not only in T2I generation but also
across various downstream tasks. Smooth Diffusion is implemented as a
plug-and-play Smooth-LoRA to work with various community models. Code is
available at https://github.com/SHI-Labs/Smooth-Diffusion.