離散數據的簡化和泛化遮罩擴散
Simplified and Generalized Masked Diffusion for Discrete Data
June 6, 2024
作者: Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias
cs.AI
摘要
遮罩擴散(或吸收擴散)被積極探索作為生成式建模離散數據的替代方法,以取代自回歸模型。然而,在這個領域的現有工作受到不必要複雜的模型公式和不同觀點之間關係不清晰的阻礙,導致參數化、訓練目標和臨時調整方面的次優處理。在這項工作中,我們的目標是提供一個簡單且通用的框架,發揮遮罩擴散模型的全部潛力。我們展示遮罩擴散模型的連續時間變分目標是交叉熵損失的簡單加權積分。我們的框架還能夠訓練具有狀態依賴遮罩計劃的泛化遮罩擴散模型。通過困惑度評估,我們在OpenWebText上訓練的模型在GPT-2規模上超越先前的擴散語言模型,在5個零樣本語言建模任務中表現優異。此外,我們的模型在像素級圖像建模方面遠遠優於先前的離散擴散模型,在CIFAR-10達到每維2.78位,在ImageNet 64x64達到每維3.42位,這些結果與相似大小的自回歸模型相當或更好。
English
Masked (or absorbing) diffusion is actively explored as an alternative to
autoregressive models for generative modeling of discrete data. However,
existing work in this area has been hindered by unnecessarily complex model
formulations and unclear relationships between different perspectives, leading
to suboptimal parameterization, training objectives, and ad hoc adjustments to
counteract these issues. In this work, we aim to provide a simple and general
framework that unlocks the full potential of masked diffusion models. We show
that the continuous-time variational objective of masked diffusion models is a
simple weighted integral of cross-entropy losses. Our framework also enables
training generalized masked diffusion models with state-dependent masking
schedules. When evaluated by perplexity, our models trained on OpenWebText
surpass prior diffusion language models at GPT-2 scale and demonstrate superior
performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our
models vastly outperform previous discrete diffusion models on pixel-level
image modeling, achieving 2.78~(CIFAR-10) and 3.42 (ImageNet 64times64) bits
per dimension that are comparable or better than autoregressive models of
similar sizes.Summary
AI-Generated Summary