간단하고 효과적인 마스크 확산 언어 모델

초록

확산 모델(diffusion model)은 고품질 이미지 생성에서 뛰어난 성능을 보이지만, 기존 연구에서는 언어 모델링에서 확산 모델과 자기회귀(autoregressive, AR) 방식 간에 상당한 성능 차이가 보고되었습니다. 본 연구에서는 간단한 마스킹된 이산 확산(masked discrete diffusion)이 이전에 생각했던 것보다 더 우수한 성능을 보인다는 것을 입증합니다. 우리는 마스킹된 확산 모델의 성능을 향상시키는 효과적인 학습 방법을 적용하고, 추가적인 개선을 가져오는 단순화된 Rao-Blackwellized 목적 함수를 도출했습니다. 우리의 목적 함수는 고전적인 마스킹된 언어 모델링 손실(masked language modeling loss)의 혼합 형태를 가지며, 효율적인 샘플러를 허용하는 인코더 전용(encoder-only) 언어 모델을 학습하는 데 사용될 수 있습니다. 이는 전통적인 언어 모델처럼 반-자기회귀(semi-autoregressive) 방식으로 임의 길이의 텍스트를 생성할 수 있는 모델을 포함합니다. 언어 모델링 벤치마크에서 현대적인 엔지니어링 기법으로 학습된 다양한 마스킹된 확산 모델은 확산 모델 중 새로운 최첨단(state-of-the-art) 성능을 달성했으며, AR 모델의 복잡도(perplexity)에 근접했습니다. 우리는 코드를 다음 링크에서 공개합니다: https://github.com/kuleshov-group/mdlm

English

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We release our code at: https://github.com/kuleshov-group/mdlm

간단하고 효과적인 마스크 확산 언어 모델

Simple and Effective Masked Diffusion Language Models

초록

Summary

Support

Support