ChatPaper.aiChatPaper

內省擴散語言模型

Introspective Diffusion Language Models

April 13, 2026
作者: Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu
cs.AI

摘要

擴散語言模型雖承諾實現並行生成,但在品質上仍落後於自迴歸模型。我們將此差距歸因於內省一致性的缺失:自迴歸模型能與自身生成內容保持一致,而擴散語言模型往往無法做到。我們定義了「內省接受率」指標,用於衡量模型是否接受其先前生成的詞元。這揭示了自迴歸訓練的結構性優勢:因果遮罩和邏輯偏移隱式強化了內省一致性。基於此發現,我們提出內省式擴散語言模型(I-DLM),該範式在保留擴散風格並行解碼的同時,繼承了自迴歸訓練的內省一致性。I-DLM採用創新的內反省步解碼(ISD)算法,使模型能在同一次前向傳播中驗證已生成詞元並推進新詞元生成。從系統視角出發,我們基於自迴歸模型的優化遺產構建I-DLM推理引擎,並通過靜態批次調度器進一步定制化。據我們所知,I-DLM是首個在品質上媲美同規模自迴歸模型、並在15項基準測試中同時超越現有擴散語言模型品質與實際服務效能的擴散模型。其在AIME-24達到69.6分,LiveCodeBench-v6達到45.7分,分別較LLaDA-2.1-mini(16B)高出超過26分和15分。除品質優勢外,I-DLM專為日益增長的大併發服務需求設計,吞吐量較現有頂尖擴散語言模型提升約3倍。
English
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.
PDF143April 15, 2026