扩散中的扩散:半自回归扩散模型中的全局连贯性重构
Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion
January 20, 2026
作者: Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang
cs.AI
摘要
全球离散扩散语言模型最引人注目的特性之一是其全局双向上下文理解能力。然而现有的基于分块的扩散研究往往引入自回归先验,这种方法虽具优势,却可能导致模型在宏观层面丧失全局连贯性。为在保持半自回归范式优点的同时重建全局上下文理解,我们提出"扩散中的扩散"框架——一种"先草拟后优化"的范式,旨在克服分块扩散模型固有的不可逆性与短视问题。该方案首先通过小分块扩散快速生成草稿,再利用具有更大双向感受野的全局双向扩散进行精炼。我们采用置信度快照重掩码技术识别需要修改的关键词元,并运用混合尺度训练来扩展分块扩散模型的全局能力。实验结果表明,我们的方法在OpenWebText数据集上为离散扩散模型设立了新基准:仅用基线模型26%的微调预算,就将生成困惑度从25.7降至21.9,显著缩小了与自回归模型的性能差距。
English
One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.