扩散中的扩散:半自回归扩散中全局一致性的重建
Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion
January 20, 2026
作者: Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang
cs.AI
摘要
全球离散扩散语言模型最引人瞩目的特性之一在于其全局双向上下文建模能力。然而现有基于分块的扩散研究往往引入自回归先验,这种做法虽能带来一定优势,却可能导致模型在宏观层面丧失全局连贯性。为在保留半自回归范式优势的同时重建全局上下文理解能力,我们提出"扩散中的扩散"框架——一种"先草稿后优化"的范式,旨在克服分块扩散模型固有的不可逆性与短视问题。该框架首先通过小尺度分块扩散快速生成文本草稿,继而利用具有更大双向感受野的全局双向扩散进行精细化重构。我们采用置信度快照重掩码技术识别最需修正的关键词元,并通过混合尺度训练拓展分块扩散模型的全局建模能力。实验结果表明,我们的方法在OpenWebText数据集上为离散扩散模型树立了新标杆:仅需基线模型26%的微调预算,便将生成困惑度从25.7降至21.9,显著缩小了与自回归模型的性能差距。
English
One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.