大规模语言与多模态模型中的离散扩散:综述
Discrete Diffusion in Large Language and Multimodal Models: A Survey
June 16, 2025
作者: Runpeng Yu, Qi Li, Xinchao Wang
cs.AI
摘要
在本研究中,我们对离散扩散语言模型(dLLMs)及离散扩散多模态语言模型(dMLLMs)进行了系统性综述。与自回归(AR)模型不同,dLLMs和dMLLMs采用了一种多标记、并行解码的范式,利用全注意力机制和基于去噪的生成策略。这一范式天然支持并行生成、细粒度输出控制以及动态响应感知能力,这些特性在AR模型中难以实现。近年来,越来越多工业级专有d(M)LLMs及大量开源学术d(M)LLMs展现出了与自回归模型相媲美的性能,同时实现了高达10倍的推理速度提升。
离散扩散LLMs和MLLMs的进步主要得益于两大领域的进展。首先是自回归LLMs和MLLMs的发展,它们积累了海量数据、基准测试及训练与推理的基础设施。其次是离散扩散背后数学模型的演进。这些进展共同推动了2025年初dLLMs和dMLLMs研究的热潮。
本文全面概述了dLLM和dMLLM领域的研究进展。我们追溯了dLLMs和dMLLMs的历史发展,形式化了其数学基础框架,并对代表性模型进行了分类。进一步,我们分析了训练与推理的关键技术,并总结了在语言、视觉-语言及生物领域的新兴应用。最后,我们探讨了未来研究与应用部署的方向。
论文收集地址:https://github.com/LiQiiiii/DLLM-Survey
English
In this work, we provide a systematic survey of Discrete Diffusion Language
Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs).
Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token,
parallel decoding paradigm using full attention and a denoising-based
generation strategy. This paradigm naturally enables parallel generation,
fine-grained output controllability, and dynamic, response-aware perception.
These capabilities are previously difficult to achieve with AR models.
Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as
a large number of open-source academic d(M)LLMs, have demonstrated performance
comparable to their autoregressive counterparts, while achieving up to 10x
acceleration in inference speed.
The advancement of discrete diffusion LLMs and MLLMs has been largely driven
by progress in two domains. The first is the development of autoregressive LLMs
and MLLMs, which has accumulated vast amounts of data, benchmarks, and
foundational infrastructure for training and inference. The second contributing
domain is the evolution of the mathematical models underlying discrete
diffusion. Together, these advancements have catalyzed a surge in dLLMs and
dMLLMs research in early 2025.
In this work, we present a comprehensive overview of the research in the dLLM
and dMLLM domains. We trace the historical development of dLLMs and dMLLMs,
formalize the underlying mathematical frameworks, and categorize representative
models. We further analyze key techniques for training and inference, and
summarize emerging applications across language, vision-language, and
biological domains. We conclude by discussing future directions for research
and deployment.
Paper collection: https://github.com/LiQiiiii/DLLM-Survey