DisCo-Diff:通過離散潛變量增強連續擴散模型
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
July 3, 2024
作者: Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis
cs.AI
摘要
擴散模型(DMs)已經徹底改革了生成式學習。它們利用擴散過程將數據編碼為簡單的高斯分佈。然而,將複雜、可能是多峰的數據分佈編碼為單一連續高斯分佈,可以說是一個不必要地具有挑戰性的學習問題。我們提出了離散-連續潛變量擴散模型(DisCo-Diff),通過引入互補的離散潛變量,來簡化這個任務。我們使用可學習的離散潛變量來擴充DMs,這些潛變量是由編碼器推斷出來的,並且對DM和編碼器進行端到端的訓練。DisCo-Diff不依賴預先訓練的網絡,使得這個框架具有普遍應用性。透過減少DM生成ODE的曲率,離散潛變量顯著簡化了學習DM的復雜噪聲到數據的映射。另外,一個自回歸變壓器模型了離散潛變量的分佈,這是一個簡單的步驟,因為DisCo-Diff只需要少量具有小碼本的離散變量。我們在玩具數據、幾個圖像合成任務以及分子對接上驗證了DisCo-Diff,並發現引入離散潛變量一致地提高了模型性能。例如,DisCo-Diff在具有ODE取樣器的類別條件ImageNet-64/128數據集上實現了最先進的FID分數。
English
Diffusion models (DMs) have revolutionized generative learning. They utilize
a diffusion process to encode data into a simple Gaussian distribution.
However, encoding a complex, potentially multimodal data distribution into a
single continuous Gaussian distribution arguably represents an unnecessarily
challenging learning problem. We propose Discrete-Continuous Latent Variable
Diffusion Models (DisCo-Diff) to simplify this task by introducing
complementary discrete latent variables. We augment DMs with learnable discrete
latents, inferred with an encoder, and train DM and encoder end-to-end.
DisCo-Diff does not rely on pre-trained networks, making the framework
universally applicable. The discrete latents significantly simplify learning
the DM's complex noise-to-data mapping by reducing the curvature of the DM's
generative ODE. An additional autoregressive transformer models the
distribution of the discrete latents, a simple step because DisCo-Diff requires
only few discrete variables with small codebooks. We validate DisCo-Diff on toy
data, several image synthesis tasks as well as molecular docking, and find that
introducing discrete latents consistently improves model performance. For
example, DisCo-Diff achieves state-of-the-art FID scores on class-conditioned
ImageNet-64/128 datasets with ODE sampler.Summary
AI-Generated Summary