スムーズディフュージョン：拡散モデルにおける滑らかな潜在空間の構築

要旨

近年、拡散モデルはテキストから画像（T2I）生成において顕著な進歩を遂げ、高忠実度で多様な内容の画像を合成できるようになりました。しかし、この進歩にもかかわらず、拡散モデル内の潜在空間の滑らかさについてはほとんど研究が行われていません。滑らかな潜在空間は、入力潜在変数に対する摂動が出力画像において安定した変化に対応することを保証します。この特性は、画像補間、反転、編集などの下流タスクにおいて有益であることが証明されています。本研究では、微小な潜在変動によって生じる顕著な視覚的変動を観察することで、拡散モデルの潜在空間の非滑らかさを明らかにします。この問題に対処するため、高性能かつ滑らかな新しいカテゴリーの拡散モデルであるSmooth Diffusionを提案します。具体的には、任意の入力潜在変数の変動と出力画像の変動の比率が拡散トレーニングのどのステップでも一定となるように強制するStep-wise Variation Regularizationを導入します。さらに、拡散モデルの潜在空間の滑らかさを効果的に評価するための補間標準偏差（ISTD）メトリックを考案します。広範な定量的および定性的実験により、Smooth DiffusionがT2I生成だけでなく、さまざまな下流タスクにおいてもより望ましいソリューションであることが実証されています。Smooth Diffusionは、さまざまなコミュニティモデルと連携するプラグアンドプレイのSmooth-LoRAとして実装されています。コードはhttps://github.com/SHI-Labs/Smooth-Diffusionで公開されています。

English

Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion.

スムーズディフュージョン：拡散モデルにおける滑らかな潜在空間の構築

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

要旨

Support