一様拡散モデル再訪：一つ抜きデノイザーと吸収状態の再定式化

要旨

離散拡散モデルは、多くの場合クリーンデータ予測によって訓練されるが、その予測を逆過程の定義にどう利用するかには複数の方法がある。Masked Diffusion Models (MDM) ではこれらの選択肢はほぼ一致するが、Uniform Diffusion Models (UDM) ではそうではない。我々は、UDMにおける標準的なプラグインブリッジパラメータ化が、ノイズ除去事後分布ではなく、各クリーントークンを自身のノイズを含む観測を用いずに予測する一個抜き事後分布によって最適化されることを示す。これはプラグインELBOと通常の交差エントロピーノイズ除去目的関数との間に不一致があることを明らかにする。我々は一個抜き目的を特徴づけ、ノイズ除去器、一個抜き事後分布、スコア間の正確な変換を導出する。これらの変換により、パラメータ化と訓練目的関数を分離することが可能になる。また、我々の結果は、追加の訓練を必要とせずに、情報に基づいた予測子・修正子サンプラーと、一個抜き予測子に基づく改良された温度サンプリングによる推論の改善をもたらす。さらに、UDMの同時分布を保持しつつ、それをマスク拡散に類似したサンプリング操作（より単純なノイズ除去事後分布、継承的なマスク解除、自然な再マスク機構を備える）に分解する、一様拡散の吸収状態による再定式化を導入する。言語モデリングにおいて、一個抜きパラメータ化は一貫してUDMの生成を改善し、吸収状態による構成はマスク拡散と同等かそれを上回る性能を示す。これらの結果は、マスク拡散と一様拡散の間の経験的なギャップが周辺分布の選択自体よりも、パラメータ化とサンプリング設計によって引き起こされることを示唆している。コードとモデルは https://github.com/samsongourevitch/rev_udm で入手可能である。

English

Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through an informed predictor-corrector sampler and improved temperature sampling based on the leave-one-out predictor. We further introduce an absorbing-state reformulation of uniform diffusion that preserves the UDM joint law while decomposing it into masked-diffusion-like sampling operations, with simpler denoising posteriors, carry-over unmasking, and a natural remasking mechanism. On language modeling, leave-one-out parameterizations consistently improve UDM generation, while the absorbing construction matches or surpasses masked diffusion. These results suggest that the empirical gap between masked and uniform diffusion is driven less by the choice of marginals themselves than by parameterization and sampling design. The code and models can be found at https://github.com/samsongourevitch/rev_udm.