DMax: dLLM向けの積極的並列デコード手法

要旨

本論文では、効率的な拡散言語モデル（dLLM）の新たなパラダイムであるDMaxを提案する。DMaxは並列デコード時の誤差蓄積を軽減し、生成品質を維持しつつ積極的なデコード並列性を実現する。従来のバイナリマスクからトークンへの遷移によるデコードを行うマスク型dLLMとは異なり、DMaxはデコードをマスク埋め込みからトークン埋め込みへの漸進的自己洗練として再定式化する。我々の手法の中核となるのは、On-Policy Uniform Trainingという新規の訓練戦略である。これはマスク型dLLMと一様dLLMを効率的に統合し、モデルがマスク入力と自身の誤った予測の両方からクリーンなトークンを回復する能力を付与する。この基盤に基づき、さらにSoft Parallel Decodingを提案する。各中間デコード状態を予測トークン埋め込みとマスク埋め込みの補間として表現することで、埋め込み空間における反復的自己修正を可能にする。様々なベンチマークでの大規模な実験により、DMaxの有効性が実証された。元のLLaDA-2.0-miniと比較して、GSM8Kでは精度を維持しつつTPFを2.04から5.47に改善し、MBPPでは同等の性能を維持しつつTPFを2.71から5.86に向上させた。H200 GPU2台では、バッチサイズ1で平均1,338 TPSを達成した。コードはhttps://github.com/czg1225/DMax で公開されている。

English

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy. On MBPP, it increases TPF from 2.71 to 5.86 while maintaining comparable performance. On two H200 GPUs, our model achieves an average of 1,338 TPS at batch size 1. Code is available at: https://github.com/czg1225/DMax

DMax: dLLM向けの積極的並列デコード手法

DMax: Aggressive Parallel Decoding for dLLMs

要旨

Support