局所的自信、全体的な停滞：拡散言語モデルにおける品質と探索のジレンマ

要旨

拡散大規模言語モデル（dLLM）は理論的に任意の順序でのトークン復号を可能にし、この柔軟性により自己回帰（AR）LLMよりも豊富な推論経路の探索が可能となる。しかし実際には、ランダム順復号は生成品質を損なうことが多い。この問題を緩和するため、低信頼度リマスキングは確信度の高いトークンを優先することで単一サンプル品質（例：Pass@1）を向上させるが、探索を抑制し多様本利得（例：Pass@k）を制限するため、品質と探索の根本的ジレンマを生み出す。本論文では、このジレンマを統一的な観点から説明する。低信頼度リマスキングが近視的な品質代理指標を改善する一方で、誘導される系列分布のエントロピーを理論的に制約することを示す。この限界を克服するため、品質と探索を明示的に均衡させる最適分布を特徴付け、復号過程中にこの分布を近似的に目標とする簡易な独立メトロポリス・ヘイスティングスサンプラーを開発する。MATH500、AIME24/25、HumanEval、MBPPを含む一連の推論ベンチマークにおける実験により、本手法がランダム及び低信頼度リマスキングよりも優れた探索-品質トレードオフを実現することを実証する。

English

Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@1) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@k), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.

局所的自信、全体的な停滞：拡散言語モデルにおける品質と探索のジレンマ

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

要旨

Support