DINGO: 拡散型大規模言語モデルのための制約付き推論

要旨

Diffusion LLMは、従来の自己回帰型LLMに代わる有望な選択肢として登場し、実行時の効率性向上に大きな可能性を提供しています。しかし、既存のDiffusionモデルは、正規表現などのユーザー指定の形式的制約を確実に強制する能力を欠いており、固定スキーマのJSON生成など構造化された出力を必要とするタスクにおいて信頼性が低いという課題があります。自己回帰モデルがトークンを逐次的に生成するのに対し、Diffusion LLMはトークンのブロックを並列に予測します。この並列性により、逐次的なトークン予測を前提とした従来の制約付きデコードアルゴリズムは、真の出力分布を維持するのに効果的ではありません。この制限を解決するため、我々はDINGOを提案します。DINGOは、動的計画法に基づく制約付きデコード戦略であり、効率的かつ確実に分布を保存します。DINGOは、モデルの予測分布の下で最も高い確率を持つ出力文字列をサンプリングしつつ、ユーザー指定の正規表現を厳密に満たすことを可能にします。標準的な記号数学およびJSON生成ベンチマークにおいて、DINGOは制約なしの推論と比較して最大68パーセントポイントの改善を達成しました。

English

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference

DINGO: 拡散型大規模言語モデルのための制約付き推論

DINGO: Constrained Inference for Diffusion LLMs

要旨

Support