基于快照的广义离散扩散模型

摘要

我们提出广义离散扩散快照法（GDDS），这是一个支持大型离散状态空间任意噪声化过程的统一离散扩散建模框架。我们的公式体系不仅涵盖所有现有离散扩散方法，还允许在破坏动态选择上获得显著更高的灵活性。前向噪声化过程基于均匀化理论，可实现快速的任意破坏。对于逆向过程，我们基于快照潜变量（而非整个噪声化路径）推导出简明的证据下界（ELBO），使得能够以清晰的概率解释高效训练标准生成建模架构。在大型词汇表离散生成任务上的实验表明，所提框架在训练效率和生成质量方面均优于现有离散扩散方法，并首次在此规模上超越自回归模型。相关代码及技术博客已发布于项目页面：https://oussamazekri.fr/gdds。

English

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : https://oussamazekri.fr/gdds{https://oussamazekri.fr/gdds}.