連續潛在擴散語言模型
Continuous Latent Diffusion Language Model
May 7, 2026
作者: Hongcan Guo, Qinyu Zhao, Yian Zhao, Shen Nie, Rui Zhu, Qiushan Guo, Feng Wang, Tao Yang, Hengshuang Zhao, Guoqiang Wei, Yan Zeng
cs.AI
摘要
大型語言模型在自迴歸範式下已取得顯著成就,但高品質文本生成未必需要拘泥於固定從左到右的順序。現有替代方案仍難以同時實現生成效率、可擴展表徵學習與有效的全域語義建模。我們提出Cola DLM——一種分層潛在擴散語言模型,通過層級化信息分解框架重構文本生成流程。該模型首先通過文本變分自編碼器學習穩定的文本-潛在映射,繼而利用塊因果DiT在連續潛在空間中建模全域語義先驗,最終通過條件解碼生成文本。從統一的馬爾可夫路徑視角看,其擴散過程執行的是潛在先驗傳輸而非詞元級觀測恢復,從而將全域語義組織與局部文本實現分離。這種設計產生了更靈活的非自迴歸歸納偏置,支持連續空間中的語義壓縮與先驗擬合,並能自然擴展至其他連續模態。通過涵蓋4個研究問題、8項基準測試、嚴格匹配的約20億參數自迴歸與LLaDA基線模型,以及擴展至約2000 EFLOPs的規模曲線實驗,我們確定了Cola DLM的有效整體配置,並驗證了其在文本生成領域的強勁擴展特性。綜合來看,研究結果確立了分層連續潛在先驗建模作為嚴格詞元級語言建模的原理性替代方案,其中生成質量與擴展行為可能比似然度更能反映模型能力,同時也為離散文本與連續模態的統一建模指明瞭具體路徑。
English
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.