多轮反思掩码在掩码扩散模型中激发推理

摘要

自回归（AR）模型的推理通常通过思维链推理与反思实现，但其对先前输出的优化仍依赖全序列生成，即使仅需局部修改时也是如此。相比之下，掩码扩散模型（MDMs）中的掩码机制天然支持对先前输出的显式局部编辑，允许选择性细化而无需丢弃先前答案并从头生成新内容。尽管这一特性更贴近人类通过迭代局部修正来纠错的方式，但现有MDM不支持多轮掩码与去噪过程。为此，我们提出反思性掩码（RM）机制，通过轻量级后训练激发MDM固有的推理能力。RM实现了原生的测试时扩展，使MDM能够基于动态演化的上下文迭代地重新审视并修正先前输出。为利用AR推理中多轮推导的洞见，我们进一步引入历史参考机制——一种无需参数的方法，可在修正过程中利用中间去噪状态。本方法无需修改模型架构，可便捷应用于现有MDM。在文本生成、数独求解与图像编辑等多样任务与模态中，反思性掩码始终优于标准掩码基线，展现出强通用性，确立了RM作为MDM推理基础原语的定位。

English

While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) naturally supports explicit local edits on previous outputs, allowing selective refinement without discarding previous answers and generating another from scratch. While this property more closely aligns with how humans correct mistakes by iterative local refinement, existing MDMs do not support multi-turn masking and denoising. We propose Reflective Masking (RM), which elicits such an intrinsic reasoning capability in MDMs via lightweight post-training. RM provides a native test-time scaling, where an MDM iteratively revisits and revises its prior outputs based on evolving context. To exploit insights from previous turns like AR reasoning, we further introduce History Reference, a parameter-free mechanism that leverages intermediate denoising states during revision. Our approach requires no architectural changes and is easily applicable to existing MDMs. Across diverse tasks and modalities, including text generation, Sudoku, and image editing, Reflective Masking consistently outperforms standard masking-based baselines and demonstrates strong generality, positioning RM as a fundamental primitive for reasoning on MDMs.