DINGO:面向扩散式大语言模型的约束推理
DINGO: Constrained Inference for Diffusion LLMs
May 29, 2025
作者: Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
cs.AI
摘要
扩散式大语言模型(Diffusion LLMs)作为传统自回归大语言模型的有力替代方案崭露头角,展现出显著提升运行时效率的潜力。然而,现有扩散模型无法可证明地强制执行用户指定的形式约束,如正则表达式,这使得它们在需要结构化输出的任务(如固定模式JSON生成)中显得不可靠。与自回归模型逐词生成不同,扩散式大语言模型并行预测一组词元。这种并行性使得传统的约束解码算法——专为顺序词元预测设计——在保持真实输出分布方面效果不佳。为解决这一局限,我们提出了DINGO,一种基于动态规划的高效且可证明保持分布的约束解码策略。DINGO能够在模型预测分布下以最高概率采样输出字符串,同时严格满足任何用户指定的正则表达式。在标准符号数学和JSON生成基准测试中,DINGO相比无约束推理实现了高达68个百分点的性能提升。
English
Diffusion LLMs have emerged as a promising alternative to conventional
autoregressive LLMs, offering significant potential for improved runtime
efficiency. However, existing diffusion models lack the ability to provably
enforce user-specified formal constraints, such as regular expressions, which
makes them unreliable for tasks that require structured outputs, such as
fixed-schema JSON generation. Unlike autoregressive models that generate tokens
sequentially, diffusion LLMs predict a block of tokens in parallel. This
parallelism makes traditional constrained decoding algorithms, which are
designed for sequential token prediction, ineffective at preserving the true
output distribution. To address this limitation, we propose DINGO, a dynamic
programming-based constrained decoding strategy that is both efficient and
provably distribution-preserving. DINGO enables sampling of output strings with
the highest probability under the model's predicted distribution, while
strictly satisfying any user-specified regular expression. On standard symbolic
math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point
improvement over unconstrained inferenceSummary
AI-Generated Summary