カテゴリカル生成モデリングのための連続的拡張離散拡散モデル

要旨

標準的な離散拡散モデルでは、すべての未観測状態を等しく扱い、それらを吸収[MASK]トークンにマッピングします。これにより、ノイズ除去ステップ間で、マスクされていないトークンから推論可能な意味情報が失われる「情報の空白」が生じます。本研究では、連続潜在空間におけるペアの拡散を離散状態空間に追加するContinuously Augmented Discrete Diffusion (CADD)フレームワークを提案します。これにより、マスクされたトークンが崩壊した「情報の空白」ではなく、ノイズを含むが情報量のある潜在ベクトルとして表現される、段階的に劣化した状態が得られます。各逆ステップにおいて、CADDは連続潜在を意味的なヒントとして活用し、離散的なノイズ除去を導くことができます。この設計はシンプルで、既存の離散拡散トレーニングと互換性があります。サンプリング時には、連続潜在ベクトルの推定器の強度と選択により、モードカバレッジ（多様な出力を生成する）とモードシーキング（文脈的に正確な出力を生成する）の行動の間で制御されたトレードオフが可能になります。実験的に、CADDがテキスト生成、画像合成、コードモデリングにおいてマスクベースの拡散を上回る生成品質を向上させることを示し、強力な離散ベースラインに対して定性的および定量的な指標で一貫した改善を示します。

English

Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented Discrete Diffusion (CADD), a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded, gradually corrupted states in which masked tokens are represented by noisy yet informative latent vectors rather than collapsed 'information voids'. At each reverse step, CADD may leverage the continuous latent as a semantic hint to guide discrete denoising. The design is clean and compatible with existing discrete diffusion training. At sampling time, the strength and choice of estimator for the continuous latent vector enables a controlled trade-off between mode-coverage (generating diverse outputs) and mode-seeking (generating contextually precise outputs) behaviors. Empirically, we demonstrate CADD improves generative quality over mask-based diffusion across text generation, image synthesis, and code modeling, with consistent gains on both qualitative and quantitative metrics against strong discrete baselines.

カテゴリカル生成モデリングのための連続的拡張離散拡散モデル

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

要旨

Support