扩散对偶论，第二章：Ψ采样器与高效课程学习

摘要

均匀状态离散扩散模型因其具备自我校正能力，在少步生成与引导任务中表现卓越，使其在这些场景下优于自回归或掩码扩散模型。然而当采用祖先采样器时，其采样质量会随步数增加而进入平台期。我们提出了一类适用于离散扩散的预测-校正采样器族，该族方法可泛化现有技术并适用于任意噪声过程。当与均匀状态扩散结合时，我们的采样器在语言和图像建模任务上均超越祖先采样：在OpenWebText数据集上实现相同单字熵下的更低生成困惑度，在CIFAR10上获得更优的FID/IS分数。关键的是，与传统采样器不同，我们的预测-校正方法能随采样步数增加持续提升性能。这些发现共同对"掩码扩散是扩散式语言建模必然发展方向"的假设提出了质疑。在采样之外，我们还为高斯松弛训练阶段开发了内存高效的课程学习方案，相比Duo方法在保持OpenWebText和LM1B数据集上相当困惑度的同时，训练时间减少25%，内存占用降低33%，并保持强劲的下游性能。代码、检查点及视频教程已发布于：https://s-sahoo.com/duo-ch2

English

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

扩散对偶论，第二章：Ψ采样器与高效课程学习

The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

摘要

Support