DINGO: Beperkte Inferentie voor Diffusie LLM's

Samenvatting

Diffusie-LLM's zijn naar voren gekomen als een veelbelovend alternatief voor conventionele autoregressieve LLM's, met aanzienlijk potentieel voor verbeterde runtime-efficiëntie. Bestaande diffusiemodellen hebben echter niet de mogelijkheid om door gebruikers gespecificeerde formele beperkingen, zoals reguliere expressies, afdwingbaar te maken, wat ze onbetrouwbaar maakt voor taken die gestructureerde uitvoer vereisen, zoals het genereren van JSON met een vast schema. In tegenstelling tot autoregressieve modellen die tokens sequentieel genereren, voorspellen diffusie-LLM's een blok tokens parallel. Deze parallelliteit maakt traditionele algoritmen voor beperkte decodering, die zijn ontworpen voor sequentiële tokenvoorspelling, ineffectief in het behouden van de ware uitvoerdistributie. Om deze beperking aan te pakken, stellen we DINGO voor, een dynamisch programmeerstrategie voor beperkte decodering die zowel efficiënt als bewezen distributiebehoudend is. DINGO maakt het mogelijk om uitvoerstrings te bemonsteren met de hoogste waarschijnlijkheid onder de door het model voorspelde distributie, terwijl strikt wordt voldaan aan elke door de gebruiker gespecificeerde reguliere expressie. Op standaard benchmarks voor symbolische wiskunde en JSON-generatie behaalt DINGO een verbetering van tot wel 68 procentpunt ten opzichte van onbeperkte inferentie.

English

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference

DINGO: Beperkte Inferentie voor Diffusie LLM's

DINGO: Constrained Inference for Diffusion LLMs

Samenvatting

Support