潜在状態精緻化デコーディング：信念状態の精緻化による拡散型言語モデルの強化

要旨

自己回帰（AR）モデルは自然言語生成の標準として残っているが、厳密に逐次的なデコーディングによる高いレイテンシに依然として悩まされている。最近の拡散モデルにインスパイアされたアプローチ、例えばLlaDAやDreamは、並列生成によってこれを緩和しているが、2つの核心的な制限に直面している。1つは情報損失であり、未確定トークンの予測分布が各ステップで破棄されること、もう1つは早期コミットメントであり、局所的な決定が十分なグローバルな調整なしに行われることである。本論文では、Latent Refinement Decoding（LRD）を紹介する。これは、Latent RefinementとPredictive Feedback Loopからなる2段階のフレームワークである。第1段階では、マスクされた位置を予測トークンとマスク埋め込みの分布混合として維持し、モデルがよりグローバルに一貫した信念を確立できるようにする。第2段階では、確信度の高いトークンを段階的に確定しつつ、不確かなトークンを反復的なフィードバックのために保持する。KLダイバージェンスのダイナミクスは、収束と早期停止のための原則的で信頼性のある基準を提供する。コーディング（HumanEval +6.3、MBPP +2.6）と推論（GSM8K +2.9、MATH500 +3.8）における実験は、LRDが精度を向上させながら最大10.6倍の高速化を実現し、並列シーケンス生成の強力で汎用的な代替手段となることを示している。

English

Autoregressive (AR) models remain the standard for natural language generation but still suffer from high latency due to strictly sequential decoding. Recent diffusion-inspired approaches, such as LlaDA and Dream, mitigate this by generating in parallel, yet they suffer from two core limitations: information loss, as predictive distributions for non-finalized tokens are discarded at each step, and premature commitment, where local decisions are made without sufficient global coordination. We introduce Latent Refinement Decoding (LRD), a two-stage framework with Latent Refinement and a Predictive Feedback Loop. The first stage maintains masked positions as distributional mixtures of predicted tokens and the mask embedding, allowing the model to establish more globally consistent beliefs. The second stage progressively finalizes confident tokens while retaining uncertain ones for iterative feedback. KL-divergence dynamics provide a principled and reliable criterion for convergence and early stopping. Experiments across coding (HumanEval +6.3, MBPP +2.6) and reasoning (GSM8K +2.9, MATH500 +3.8) show that LRD improves accuracy while delivering speedups of up to 10.6x, making it a strong and versatile alternative for parallel sequence generation.

潜在状態精緻化デコーディング：信念状態の精緻化による拡散型言語モデルの強化

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

要旨

Support