平滑擴散：在擴散模型中打造平滑潛在空間

摘要

最近，擴散模型在文本到圖像（T2I）生成方面取得了顯著進展，能夠合成具有高保真度和多樣內容的圖像。儘管有這一進展，擴散模型內的潛在空間平滑性仍然很少被探索。平滑的潛在空間確保對輸入潛在的微小扰動對應於輸出圖像的穩定變化。這種特性在包括圖像插值、反演和編輯在內的下游任務中證明了其益處。在這項工作中，我們通過觀察由於微小潛在變化而導致的明顯視覺波動，揭示了擴散潛在空間的非平滑性。為了應對這個問題，我們提出了平滑擴散，這是一類新的擴散模型，可以同時具有高性能和平滑性。具體來說，我們引入了逐步變化正則化，以強制施加任意輸入潛在的變化與輸出圖像的變化之間的比例在任何擴散訓練步驟中保持恆定。此外，我們設計了一個插值標準差（ISTD）指標，有效評估擴散模型的潛在空間平滑性。廣泛的定量和定性實驗表明，平滑擴散不僅在T2I生成方面表現突出，而且在各種下游任務中也是更理想的解決方案。平滑擴散被實現為一個即插即用的Smooth-LoRA，可與各種社區模型配合使用。代碼可在https://github.com/SHI-Labs/Smooth-Diffusion 找到。

English

Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion.

平滑擴散：在擴散模型中打造平滑潛在空間

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

摘要

Support