序列擴散語言模型

摘要

擴散語言模型（DLMs）在理論上具有高效性，但受限於固定長度的解碼以及與鍵值（KV）緩存的不兼容性。區塊擴散雖緩解了這些問題，但仍強制執行固定區塊大小且需要昂貴的訓練成本。我們引入了下一序列預測（NSP），它統一了下一詞元與下一區塊的預測，使模型能夠自適應地決定每一步的生成長度。當長度固定為1時，NSP便退化為標準的下一詞元預測。基於NSP，我們提出了序列擴散語言模型（SDLM），該模型能夠以最小成本改造預訓練的自迴歸語言模型（ALMs）。具體而言，SDLM在固定大小的掩碼區塊內執行擴散推理，但根據模型置信度動態解碼連續子序列，從而保持與KV緩存的兼容性，並提升對序列中不同不確定性與語義的魯棒性。實驗表明，SDLM僅使用350萬訓練樣本即可匹配或超越強勁的自迴歸基線模型，同時實現了比Qwen-2.5高出2.1倍的吞吐量。值得注意的是，SDLM-32B模型展現出更為顯著的效率提升，證明了我們建模範式的強大可擴展潛力。項目頁面與代碼請見：https://github.com/OpenGVLab/SDLM。

English

Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value (KV) caches. Block diffusion mitigates these issues, yet still enforces a fixed block size and requires expensive training. We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction, enabling the model to adaptively determine the generation length at each step. When the length is fixed to 1, NSP reduces to standard next-token prediction. Building on NSP, we propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost. Specifically, SDLM performs diffusion inference within fixed-size mask blocks, but dynamically decodes consecutive subsequences based on model confidence, thereby preserving KV-cache compatibility and improving robustness to varying uncertainty and semantics across the sequence. Experiments show that SDLM matches or surpasses strong autoregressive baselines using only 3.5M training samples, while achieving 2.1 higher throughput than Qwen-2.5. Notably, the SDLM-32B model delivers even more pronounced efficiency gains, demonstrating the strong scalability potential of our modeling paradigm. Project page and codes: https://github.com/OpenGVLab/SDLM

序列擴散語言模型

Sequential Diffusion Language Models

摘要

Support