DisCo-Diff：通过离散潜变量增强连续扩散模型

摘要

扩散模型（DMs）已经彻底改变了生成式学习。它们利用扩散过程将数据编码成简单的高斯分布。然而，将复杂且可能是多峰的数据分布编码为单一连续的高斯分布可以说是一个不必要具有挑战性的学习问题。我们提出了离散-连续潜变量扩散模型（DisCo-Diff），通过引入互补的离散潜变量来简化这一任务。我们使用可学习的离散潜变量来增强DMs，这些潜变量是由编码器推断出来的，并且对DM和编码器进行端到端的训练。DisCo-Diff不依赖预训练网络，使得该框架具有普适性。离散潜变量显著简化了学习DM复杂噪声到数据映射的过程，通过减少生成ODE的曲率。另外，一个自回归变换器模型了离散潜变量的分布，这一步骤很简单，因为DisCo-Diff只需要少量具有小码本的离散变量。我们在玩具数据、几个图像合成任务以及分子对接上验证了DisCo-Diff，并发现引入离散潜变量能够持续提升模型性能。例如，DisCo-Diff在带ODE采样器的类别条件ImageNet-64/128数据集上实现了最先进的FID分数。

English

Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into a simple Gaussian distribution. However, encoding a complex, potentially multimodal data distribution into a single continuous Gaussian distribution arguably represents an unnecessarily challenging learning problem. We propose Discrete-Continuous Latent Variable Diffusion Models (DisCo-Diff) to simplify this task by introducing complementary discrete latent variables. We augment DMs with learnable discrete latents, inferred with an encoder, and train DM and encoder end-to-end. DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable. The discrete latents significantly simplify learning the DM's complex noise-to-data mapping by reducing the curvature of the DM's generative ODE. An additional autoregressive transformer models the distribution of the discrete latents, a simple step because DisCo-Diff requires only few discrete variables with small codebooks. We validate DisCo-Diff on toy data, several image synthesis tasks as well as molecular docking, and find that introducing discrete latents consistently improves model performance. For example, DisCo-Diff achieves state-of-the-art FID scores on class-conditioned ImageNet-64/128 datasets with ODE sampler.

DisCo-Diff：通过离散潜变量增强连续扩散模型

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

摘要

Support