均勻擴散模型再探：留一出去噪器與吸收狀態重新表述

摘要

離散擴散模型通常透過乾淨資料預測進行訓練，但該預測可用不同方式定義逆向動態。在遮罩擴散模型（MDM）中，這些選擇大致一致，而在均勻擴散模型（UDM）中則不然。我們證明，UDM 標準的插件式橋接參數化並非由去噪後驗最佳化，而是由一種留一後驗所主導，該後驗在預測每個乾淨 token 時不依賴其自身的雜訊觀測值。這指出了插件式 ELBO 與常見的交叉熵去噪目標之間的不匹配。我們刻畫了留一目標的特徵，並推導出去噪器、留一後驗與得分之間的精確轉換關係。這些轉換使我們能夠將參數化與訓練目標分離。我們的研究結果還帶來了推理階段的改進，無需額外訓練：基於留一預測器的資訊型預測器-校正器取樣器，以及改良的溫度取樣方法。我們進一步提出均勻擴散的吸收態重構，該方法在保留 UDM 聯合分佈的同時，將其分解為類似遮罩擴散的取樣操作，並具備更簡潔的去噪後驗、遷移性去遮罩以及自然的重新遮罩機制。在語言建模任務中，留一參數化持續改善 UDM 的生成品質，而吸收態構造則能媲美甚至超越遮罩擴散。這些結果表明，遮罩擴散與均勻擴散之間的經驗差距，主要源於參數化與取樣設計的差異，而非邊際分佈本身的選擇。程式碼與模型可在 https://github.com/samsongourevitch/rev_udm 取得。

English

Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through an informed predictor-corrector sampler and improved temperature sampling based on the leave-one-out predictor. We further introduce an absorbing-state reformulation of uniform diffusion that preserves the UDM joint law while decomposing it into masked-diffusion-like sampling operations, with simpler denoising posteriors, carry-over unmasking, and a natural remasking mechanism. On language modeling, leave-one-out parameterizations consistently improve UDM generation, while the absorbing construction matches or surpasses masked diffusion. These results suggest that the empirical gap between masked and uniform diffusion is driven less by the choice of marginals themselves than by parameterization and sampling design. The code and models can be found at https://github.com/samsongourevitch/rev_udm.