균일 확산 모델 재고찰: Leave-One-Out 잡음 제거기와 흡수 상태 재공식화

초록

이산 확산 모델은 종종 깨끗한 데이터 예측을 통해 훈련되지만, 이 예측은 역방향 동역학을 정의하는 데 다양한 방식으로 사용될 수 있습니다. 마스크 확산 모델(MDM)에서는 이러한 선택이 대체로 일치하는 반면, 균일 확산 모델(UDM)에서는 그렇지 않습니다. 우리는 UDM에 대한 표준 플러그인 브리지 매개변수화가 잡음 제거 사후 확률에 의해 최적화되지 않고, 각 깨끗한 토큰을 자신의 잡음 관측값을 사용하지 않고 예측하는 leave-one-out 사후 확률에 의해 최적화된다는 것을 보여줍니다. 이는 플러그인 ELBO와 일반적인 교차 엔트로피 잡음 제거 목적 함수 간의 불일치를 식별합니다. 우리는 leave-one-out 대상을 특성화하고, 잡음 제거기, leave-one-out 사후 확률, 스코어 간의 정확한 변환을 도출합니다. 이러한 변환을 통해 매개변수화와 훈련 목적 함수를 분리할 수 있습니다. 우리의 결과는 또한 정보에 기반한 예측자-교정자 샘플러와 leave-one-out 예측자에 기반한 개선된 온도 샘플링을 통해 추가 훈련 없이 추론 개선을 이끌어냅니다. 또한 우리는 UDM 결합 법칙을 유지하면서 이를 마스크 확산과 유사한 샘플링 연산, 더 간단한 잡음 제거 사후 확률, 이월 언마스킹, 자연스러운 재마스킹 메커니즘으로 분해하는 균일 확산의 흡수 상태 재구성을 소개합니다. 언어 모델링에서 leave-one-out 매개변수화는 UDM 생성 성능을 일관되게 향상시키는 반면, 흡수 구성은 마스크 확산과 동등하거나 이를 능가합니다. 이러한 결과는 마스크 확산과 균일 확산 사이의 경험적 차이가 한계 분포 자체의 선택보다는 매개변수화와 샘플링 설계에 더 크게 기인함을 시사합니다. 코드와 모델은 https://github.com/samsongourevitch/rev_udm에서 확인할 수 있습니다.

English

Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through an informed predictor-corrector sampler and improved temperature sampling based on the leave-one-out predictor. We further introduce an absorbing-state reformulation of uniform diffusion that preserves the UDM joint law while decomposing it into masked-diffusion-like sampling operations, with simpler denoising posteriors, carry-over unmasking, and a natural remasking mechanism. On language modeling, leave-one-out parameterizations consistently improve UDM generation, while the absorbing construction matches or surpasses masked diffusion. These results suggest that the empirical gap between masked and uniform diffusion is driven less by the choice of marginals themselves than by parameterization and sampling design. The code and models can be found at https://github.com/samsongourevitch/rev_udm.