局部自信，全局困局：扩散语言模型中的质量-探索困境

摘要

扩散大语言模型（dLLMs）理论上允许以任意顺序进行标记解码，这种灵活性可能比自回归（AR）大语言模型实现更丰富的推理路径探索。然而在实际应用中，随机顺序解码往往会损害生成质量。为缓解此问题，低置信度重掩码通过优先选择高置信度标记来提升单样本质量（如Pass@1），但同时也抑制了探索性并限制了多样本收益（如Pass@k），从而形成了根本性的质量-探索困境。本文提出对该困境的统一解释：我们证明低置信度重掩码虽能改进短视的质量代理指标，但可证明地约束了诱导序列分布的熵。为突破此局限，我们刻画了显式平衡质量与探索的最优分布特征，并开发了一种独立的Metropolis-Hastings采样器，在解码过程中近似逼近该分布。在MATH500、AIME24/25、HumanEval和MBPP等推理基准测试上的实验表明，我们的方法相比随机重掩码和低置信度重掩码能实现更优的探索-质量权衡。

English

Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@1) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@k), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.

局部自信，全局困局：扩散语言模型中的质量-探索困境

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

摘要

Support