DisCo-Diff:通过离散潜变量增强连续扩散模型
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents
July 3, 2024
作者: Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis
cs.AI
摘要
扩散模型(DMs)已经彻底改变了生成式学习。它们利用扩散过程将数据编码成简单的高斯分布。然而,将复杂且可能是多峰的数据分布编码为单一连续的高斯分布可以说是一个不必要具有挑战性的学习问题。我们提出了离散-连续潜变量扩散模型(DisCo-Diff),通过引入互补的离散潜变量来简化这一任务。我们使用可学习的离散潜变量来增强DMs,这些潜变量是由编码器推断出来的,并且对DM和编码器进行端到端的训练。DisCo-Diff不依赖预训练网络,使得该框架具有普适性。离散潜变量显著简化了学习DM复杂噪声到数据映射的过程,通过减少生成ODE的曲率。另外,一个自回归变换器模型了离散潜变量的分布,这一步骤很简单,因为DisCo-Diff只需要少量具有小码本的离散变量。我们在玩具数据、几个图像合成任务以及分子对接上验证了DisCo-Diff,并发现引入离散潜变量能够持续提升模型性能。例如,DisCo-Diff在带ODE采样器的类别条件ImageNet-64/128数据集上实现了最先进的FID分数。
English
Diffusion models (DMs) have revolutionized generative learning. They utilize
a diffusion process to encode data into a simple Gaussian distribution.
However, encoding a complex, potentially multimodal data distribution into a
single continuous Gaussian distribution arguably represents an unnecessarily
challenging learning problem. We propose Discrete-Continuous Latent Variable
Diffusion Models (DisCo-Diff) to simplify this task by introducing
complementary discrete latent variables. We augment DMs with learnable discrete
latents, inferred with an encoder, and train DM and encoder end-to-end.
DisCo-Diff does not rely on pre-trained networks, making the framework
universally applicable. The discrete latents significantly simplify learning
the DM's complex noise-to-data mapping by reducing the curvature of the DM's
generative ODE. An additional autoregressive transformer models the
distribution of the discrete latents, a simple step because DisCo-Diff requires
only few discrete variables with small codebooks. We validate DisCo-Diff on toy
data, several image synthesis tasks as well as molecular docking, and find that
introducing discrete latents consistently improves model performance. For
example, DisCo-Diff achieves state-of-the-art FID scores on class-conditioned
ImageNet-64/128 datasets with ODE sampler.Summary
AI-Generated Summary