离散扩散模型中理解与生成的平衡
Balancing Understanding and Generation in Discrete Diffusion Models
February 1, 2026
作者: Yue Liu, Yuzhong Zhao, Zheyong Xie, Qixiang Ye, Jianbin Jiao, Yao Hu, Shaosheng Cao, Yunfan Liu
cs.AI
摘要
在离散生成建模领域,两种主流范式展现出差异化能力:掩码扩散语言模型(MDLM)擅长语义理解和零样本泛化,而均匀噪声扩散语言模型(UDLM)则能实现优质少步生成,但二者均未能在理解与生成维度取得平衡。为此,我们提出XDLM模型,通过稳态噪声核桥接这两种范式。XDLM具有两大核心贡献:(1)从理论层面统一MDLM与UDLM,将二者转化为该框架的特例;(2)通过后验概率的代数简化缓解内存瓶颈。实验表明XDLM成功推进了理解能力与生成质量的帕累托前沿。量化结果显示,XDLM在零样本文本基准上超越UDLM达5.4分,在少步图像生成中显著优于MDLM(FID指标54.1对80.8)。当扩展至80亿参数大语言模型调优时,XDLM仅用32步即达到15.0的MBPP评分,实现基线性能翻倍。训练动态分析进一步揭示了XDLM具备长期扩展优势。代码已开源:https://github.com/MzeroMiko/XDLM
English
In discrete generative modeling, two dominant paradigms demonstrate divergent capabilities: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalization, whereas Uniform-noise Diffusion Language Models (UDLM) achieve strong few-step generation quality, yet neither attains balanced performance across both dimensions. To address this, we propose XDLM, which bridges the two paradigms via a stationary noise kernel. XDLM offers two key contributions: (1) it provides a principled theoretical unification of MDLM and UDLM, recovering each paradigm as a special case; and (2) an alleviated memory bottleneck enabled by an algebraic simplification of the posterior probabilities. Experiments demonstrate that XDLM advances the Pareto frontier between understanding capability and generation quality. Quantitatively, XDLM surpasses UDLM by 5.4 points on zero-shot text benchmarks and outperforms MDLM in few-step image generation (FID 54.1 vs. 80.8). When scaled to tune an 8B-parameter large language model, XDLM achieves 15.0 MBPP in just 32 steps, effectively doubling the baseline performance. Finally, analysis of training dynamics reveals XDLM's superior potential for long-term scaling. Code is available at https://github.com/MzeroMiko/XDLM