Esoterische Taalmodellen

Samenvatting

Diffusiegebaseerde taalmodellen bieden een overtuigend alternatief voor autoregressieve (AR) modellen door parallelle en controleerbare generatie mogelijk te maken. Binnen deze familie van modellen behalen Masked Diffusion Models (MDMs) de sterkste prestaties, maar presteren ze nog steeds minder goed dan AR-modellen op het gebied van perplexiteit en missen ze belangrijke efficiëntiefuncties tijdens inferentie—met name KV-caching. In dit werk introduceren we Eso-LMs, een nieuwe familie van modellen die de AR- en MDM-paradigma's combineert, waardoor een soepele interpolatie tussen hun perplexiteiten mogelijk is, terwijl hun respectievelijke beperkingen worden overwonnen. Eso-LMs vestigen een nieuwe standaard op het gebied van taalmodellering. Cruciaal is dat we de **eersten zijn die KV-caching introduceren voor MDMs** terwijl parallelle generatie behouden blijft, wat de inferentie-efficiëntie aanzienlijk verbetert. Gecombineerd met een geoptimaliseerd sampling-schema bereikt onze methode tot **65x** snellere inferentie dan standaard MDMs en **4x** snellere inferentie dan eerdere semi-autoregressieve benaderingen. We bieden de code en modelcheckpoints aan op de projectpagina: [http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)

English

Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features--most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Eso-LMs set a new state of the art on standard language modeling benchmarks. Crucially, we are the **first to introduce KV caching for MDMs** while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves up to **65x** faster inference than standard MDMs and **4x** faster inference than prior semi-autoregressive approaches. We provide the code and model checkpoints on the project page: [http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)

Esoterische Taalmodellen

Esoteric Language Models

Samenvatting

Support