簡單而有效的遮罩擴散語言模型

摘要

儘管擴散模型擅長生成高質量圖像，先前的研究報告指出擴散模型與自回歸（AR）方法在語言建模上存在顯著的性能差距。在本研究中，我們展示了簡單的遮罩離散擴散比先前預期的表現更好。我們應用了一個有效的訓練配方，提高了遮罩擴散模型的性能，並推導出一個簡化的 Rao-Blackwellized 目標，帶來額外的改進。我們的目標具有簡單的形式 -- 它是傳統遮罩語言建模損失的混合，可用於訓練僅具有編碼器的語言模型，允許高效的取樣器，包括能夠半自回歸地生成任意長度文本的傳統語言模型。在語言建模基準測試中，使用現代工程實踐訓練的各種遮罩擴散模型實現了新的擴散模型最佳表現，並接近自回歸困惑度。我們在以下鏈接釋出我們的程式碼：https://github.com/kuleshov-group/mdlm

English

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We release our code at: https://github.com/kuleshov-group/mdlm

簡單而有效的遮罩擴散語言模型

Simple and Effective Masked Diffusion Language Models

摘要

Support