深奥语言模型
Esoteric Language Models
June 2, 2025
作者: Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
cs.AI
摘要
基于扩散的语言模型提供了一种引人注目的替代方案,相较于自回归(AR)模型,它能够实现并行且可控的生成。在这一系列模型中,掩码扩散模型(MDMs)展现了最强的性能,但在困惑度上仍不及AR模型,并且缺乏关键推理效率特性——尤其是KV缓存。在本研究中,我们引入了Eso-LMs,这是一个融合了AR与MDM范式的新模型家族,能够在两者之间实现困惑度的平滑过渡,同时克服各自的局限。Eso-LMs在标准语言建模基准测试中确立了新的技术前沿。尤为关键的是,我们**首次为MDMs引入了KV缓存**,同时保留了并行生成能力,显著提升了推理效率。结合优化的采样策略,我们的方法相较于标准MDMs实现了高达**65倍**的推理加速,相较于先前的半自回归方法也达到了**4倍**的加速。项目页面提供了代码及模型检查点:[http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)
English
Diffusion-based language models offer a compelling alternative to
autoregressive (AR) models by enabling parallel and controllable generation.
Among this family of models, Masked Diffusion Models (MDMs) achieve the
strongest performance but still underperform AR models in perplexity and lack
key inference-time efficiency features--most notably, KV caching. In this work,
we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms,
enabling smooth interpolation between their perplexities while overcoming their
respective limitations. Eso-LMs set a new state of the art on standard language
modeling benchmarks. Crucially, we are the **first to introduce KV caching for
MDMs** while preserving parallel generation, significantly improving inference
efficiency. Combined with an optimized sampling schedule, our method achieves
up to **65x** faster inference than standard MDMs and **4x** faster inference
than prior semi-autoregressive approaches. We provide the code and model
checkpoints on the project page:
[http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)