简单而高效的遮蔽扩散语言模型

摘要

尽管扩散模型在生成高质量图像方面表现出色，先前的研究报告指出扩散模型与自回归（AR）方法在语言建模方面存在显著的性能差距。在这项研究中，我们展示了简单的掩蔽离散扩散比先前认为的更具性能。我们应用了一种有效的训练方法，提高了掩蔽扩散模型的性能，并推导出了一个简化的 Rao-Blackwellized 目标，从而获得额外的改进。我们的目标具有简单的形式 -- 它是经典掩蔽语言建模损失的混合体，并可用于训练仅包含编码器的语言模型，允许使用高效的采样器，包括能够半自回归地生成任意长度文本的传统语言模型。在语言建模基准测试中，一系列经过现代工程实践训练的掩蔽扩散模型实现了扩散模型中的最新技术水平，并接近自回归困惑度。我们在以下网址发布了我们的代码：https://github.com/kuleshov-group/mdlm

English

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We release our code at: https://github.com/kuleshov-group/mdlm

简单而高效的遮蔽扩散语言模型

Simple and Effective Masked Diffusion Language Models

摘要

Support