離散擴散於大型語言與多模態模型之應用綜述

Discrete Diffusion in Large Language and Multimodal Models: A Survey

June 16, 2025

作者: Runpeng Yu, Qi Li, Xinchao Wang

cs.AI

摘要

在本研究中，我们对离散扩散语言模型（dLLMs）及离散扩散多模态语言模型（dMLLMs）进行了系统性综述。与自回归（AR）模型不同，dLLMs和dMLLMs采用了一种多令牌并行解码范式，利用全注意力机制和基于去噪的生成策略。这一范式天然支持并行生成、细粒度输出控制以及动态响应感知能力，这些特性在以往的自回归模型中难以实现。近年来，越来越多的工业级专有d(M)LLMs以及大量开源学术d(M)LLMs展现了与自回归模型相媲美的性能，同时实现了高达10倍的推理速度提升。离散扩散LLMs和MLLMs的进步主要得益于两大领域的进展。首先是自回归LLMs和MLLMs的发展，为训练和推理积累了海量数据、基准测试及基础设施。其次，离散扩散背后的数学模型不断演进。这些进展共同推动了2025年初dLLMs和dMLLMs研究热潮的兴起。本研究全面概述了dLLM和dMLLM领域的研究成果。我们追溯了dLLMs和dMLLMs的历史发展脉络，形式化了其背后的数学框架，并对代表性模型进行了分类。进一步，我们分析了训练与推理的关键技术，并总结了在语言、视觉-语言及生物领域的新兴应用。最后，我们探讨了未来研究与应用部署的方向。论文合集请访问：https://github.com/LiQiiiii/DLLM-Survey

English

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey