대규모 언어 및 다중모달 모델에서의 이산 확산: 연구 동향 분석

초록

본 연구에서는 이산 확산 언어 모델(Discrete Diffusion Language Models, dLLMs)과 이산 확산 다중모달 언어 모델(Discrete Diffusion Multimodal Language Models, dMLLMs)에 대한 체계적인 조사를 제공한다. 자기회귀(autoregressive, AR) 모델과 달리, dLLMs와 dMLLMs는 다중 토큰, 병렬 디코딩 패러다임을 채택하며, 전체 어텐션(full attention)과 잡음 제거 기반 생성 전략을 사용한다. 이 패러다임은 자연스럽게 병렬 생성, 세밀한 출력 제어 가능성, 그리고 동적이며 응답 인식 가능한 지각을 가능하게 한다. 이러한 기능들은 기존의 AR 모델로는 달성하기 어려웠던 것들이다. 최근 들어, 산업 규모의 독점 d(M)LLMs뿐만 아니라 다수의 오픈소스 학술 d(M)LLMs가 자기회귀 모델과 비슷한 성능을 보이면서도 추론 속도에서 최대 10배의 가속을 달성한 사례가 증가하고 있다. 이산 확산 LLMs와 MLLMs의 발전은 크게 두 가지 영역의 진보에 의해 주도되었다. 첫 번째는 자기회귀 LLMs와 MLLMs의 개발로, 이는 방대한 양의 데이터, 벤치마크, 그리고 훈련 및 추론을 위한 기반 인프라를 축적했다. 두 번째 기여 영역은 이산 확산의 기반이 되는 수학적 모델의 진화이다. 이러한 진보들이 함께 작용하여 2025년 초에 dLLMs와 dMLLMs 연구의 급증을 촉발시켰다. 본 연구에서는 dLLM과 dMLLM 영역의 연구에 대한 포괄적인 개요를 제시한다. 우리는 dLLMs와 dMLLMs의 역사적 발전을 추적하고, 기반이 되는 수학적 프레임워크를 정형화하며, 대표적인 모델들을 분류한다. 또한, 훈련과 추론을 위한 핵심 기술들을 분석하고, 언어, 시각-언어, 생물학적 영역에서의 신흥 응용 사례들을 요약한다. 마지막으로, 연구와 배포를 위한 미래 방향에 대해 논의한다. 논문 모음: https://github.com/LiQiiiii/DLLM-Survey

English

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

대규모 언어 및 다중모달 모델에서의 이산 확산: 연구 동향 분석

Discrete Diffusion in Large Language and Multimodal Models: A Survey

초록

Support