大規模言語モデルとマルチモーダルモデルにおける離散拡散：サーベイ

要旨

本研究では、離散拡散言語モデル（dLLMs）および離散拡散マルチモーダル言語モデル（dMLLMs）に関する体系的な調査を提供する。自己回帰（AR）モデルとは異なり、dLLMsとdMLLMsは、全注意機構とノイズ除去ベースの生成戦略を用いたマルチトークン並列デコードパラダイムを採用している。このパラダイムは、並列生成、細粒度の出力制御性、動的かつ応答認識型の知覚を自然に実現する。これらの能力は、従来のARモデルでは達成が困難であった。最近では、産業規模のプロプライエタリなd(M)LLMsや、多数のオープンソースの学術的d(M)LLMsが、自己回帰モデルと同等の性能を示しつつ、推論速度において最大10倍の高速化を達成している。離散拡散LLMsおよびMLLMsの進展は、主に2つの領域の進歩によって推進されてきた。第一に、自己回帰LLMsおよびMLLMsの開発があり、これにより、訓練と推論のための膨大なデータ、ベンチマーク、基盤インフラが蓄積された。第二に、離散拡散の基盤となる数学モデルの進化がある。これらの進展が相まって、2025年初頭にはdLLMsおよびdMLLMsの研究が急増した。本研究では、dLLMおよびdMLLM領域の研究を包括的に概観する。dLLMsとdMLLMsの歴史的発展を辿り、基盤となる数学的フレームワークを形式化し、代表的なモデルを分類する。さらに、訓練と推論のための主要な技術を分析し、言語、視覚言語、生物学的領域における新興アプリケーションをまとめる。最後に、研究と展開の将来の方向性について議論する。論文コレクション: https://github.com/LiQiiiii/DLLM-Survey

English

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

大規模言語モデルとマルチモーダルモデルにおける離散拡散：サーベイ

Discrete Diffusion in Large Language and Multimodal Models: A Survey

要旨

Support