DINGO:擴散式大型語言模型的約束推論
DINGO: Constrained Inference for Diffusion LLMs
May 29, 2025
作者: Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
cs.AI
摘要
扩散式大语言模型(Diffusion LLMs)作为一种有前景的替代方案,相较于传统的自回归大语言模型,展现出显著的运行时效率提升潜力。然而,现有的扩散模型无法可证明地强制执行用户指定的形式约束,如正则表达式,这使得它们在需要结构化输出的任务(如固定模式的JSON生成)中显得不可靠。与自回归模型逐词生成不同,扩散式大语言模型并行预测一组词元。这种并行性使得传统的约束解码算法——专为顺序词元预测设计——在保持真实输出分布方面效果不佳。针对这一局限,我们提出了DINGO,一种基于动态规划的高效且可证明保持分布的约束解码策略。DINGO能够在严格满足用户指定正则表达式的同时,从模型预测分布中采样出概率最高的输出字符串。在标准的符号数学和JSON生成基准测试中,DINGO相较于无约束推理,实现了高达68个百分点的性能提升。
English
Diffusion LLMs have emerged as a promising alternative to conventional
autoregressive LLMs, offering significant potential for improved runtime
efficiency. However, existing diffusion models lack the ability to provably
enforce user-specified formal constraints, such as regular expressions, which
makes them unreliable for tasks that require structured outputs, such as
fixed-schema JSON generation. Unlike autoregressive models that generate tokens
sequentially, diffusion LLMs predict a block of tokens in parallel. This
parallelism makes traditional constrained decoding algorithms, which are
designed for sequential token prediction, ineffective at preserving the true
output distribution. To address this limitation, we propose DINGO, a dynamic
programming-based constrained decoding strategy that is both efficient and
provably distribution-preserving. DINGO enables sampling of output strings with
the highest probability under the model's predicted distribution, while
strictly satisfying any user-specified regular expression. On standard symbolic
math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point
improvement over unconstrained inference