Glatte Diffusion: Gestaltung glatter latenter Räume in Diffusionsmodellen

papers.abstract

Kürzlich haben Diffusionsmodelle bemerkenswerte Fortschritte in der Text-zu-Bild (T2I)-Generierung erzielt, indem sie Bilder mit hoher Detailtreue und vielfältigen Inhalten synthetisieren. Trotz dieses Fortschritts bleibt die Glattheit des latenten Raums in Diffusionsmodellen weitgehend unerforscht. Glatte latente Räume gewährleisten, dass eine Störung eines Eingabe-Latents einer stetigen Veränderung im Ausgabebild entspricht. Diese Eigenschaft erweist sich als vorteilhaft in nachgelagerten Aufgaben, einschließlich Bildinterpolation, -inversion und -bearbeitung. In dieser Arbeit decken wir die Nicht-Glattheit der latenten Räume von Diffusionsmodellen auf, indem wir deutliche visuelle Schwankungen beobachten, die sich aus geringfügigen latenten Variationen ergeben. Um dieses Problem zu lösen, schlagen wir Smooth Diffusion vor, eine neue Kategorie von Diffusionsmodellen, die gleichzeitig leistungsstark und glatt sein können. Insbesondere führen wir eine schrittweise Variationsregularisierung ein, um sicherzustellen, dass das Verhältnis zwischen den Variationen eines beliebigen Eingabe-Latents und denen des Ausgabebildes in jedem Schritt des Diffusionstrainings konstant ist. Darüber hinaus entwickeln wir eine Metrik für die Interpolationsstandardabweichung (ISTD), um die Glattheit des latenten Raums eines Diffusionsmodells effektiv zu bewerten. Umfangreiche quantitative und qualitative Experimente zeigen, dass Smooth Diffusion nicht nur in der T2I-Generierung, sondern auch in verschiedenen nachgelagerten Aufgaben eine wünschenswerte Lösung darstellt. Smooth Diffusion wird als Plug-and-Play Smooth-LoRA implementiert, um mit verschiedenen Community-Modellen zu arbeiten. Der Code ist verfügbar unter https://github.com/SHI-Labs/Smooth-Diffusion.

English

Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion.

Glatte Diffusion: Gestaltung glatter latenter Räume in Diffusionsmodellen

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

papers.abstract

Support