局部自信与全局困局：扩散语言模型中的质量-探索困境

摘要

扩散大语言模型（dLLMs）在理论上允许以任意顺序进行标记解码，这种灵活性相较于自回归（AR）大语言模型能够实现更丰富的推理路径探索。然而在实际应用中，随机顺序解码往往会损害生成质量。为缓解此问题，低置信度重掩码技术通过优先选择高置信度标记来提升单样本质量（如Pass@1），但同时也抑制了探索性并限制了多样本收益（如Pass@k），从而形成了根本性的质量-探索困境。本文对该困境提出了统一的理论解释：我们证明低置信度重掩码技术虽能优化一种短视的质量代理指标，却必然约束所诱导序列分布的熵值。为突破此局限，我们刻画了显式平衡质量与探索的最优分布特征，并开发了一种基于独立Metropolis-Hastings算法的采样器，在解码过程中近似拟合该分布。在MATH500、AIME24/25、HumanEval及MBPP等一系列推理基准测试上的实验表明，本方法相较于随机重掩码和低置信度重掩码，能实现更优的探索-质量权衡。

English

Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@1) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@k), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.

局部自信与全局困局：扩散语言模型中的质量-探索困境

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

摘要

Support