스냅샷에서 일반화된 이산 확산

초록

우리는 대규모 이산 상태 공간에서 임의의 노이즈 추가 과정을 지원하는 통합 이산 확산 모델링 프레임워크인 GDDS(Generalized Discrete Discrete Diffusion from Snapshots)를 소개합니다. 우리의 공식은 기존의 모든 이산 확산 접근법을 포괄하면서도 손상 역학 선택에 있어 훨씬 더 큰 유연성을 허용합니다. 순방향 노이즈 추가 과정은 균일화(uniformization)에 기반하며 빠른 임의 손상을 가능하게 합니다. 역과정에서는 전체 노이즈 경로 대신 스냅샷 잠재 변수 기반의 단순한 ELBO(Evidence Lower Bound)를 유도하여, 명확한 확률론적 해석과 함께 표준 생성 모델 아키텍처의 효율적인 학습을 가능하게 합니다. 대규모 어휘 집합을 대상으로 한 이산 생성 실험에서, 제안된 프레임워크는 학습 효율성과 생성 품질 측면에서 기존 이산 확산 방법을 능가하며, 이 규모에서는 처음으로 자기회귀 모델을 앞질렀습니다. 코드와 블로그 글은 프로젝트 페이지(https://oussamazekri.fr/gdds)에서 확인할 수 있습니다.

English

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : https://oussamazekri.fr/gdds{https://oussamazekri.fr/gdds}.