深奥语言模型
Esoteric Language Models
June 2, 2025
作者: Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
cs.AI
摘要
基于扩散的语言模型为自回归(AR)模型提供了一个引人注目的替代方案,它支持并行且可控的生成。在这一系列模型中,掩码扩散模型(MDMs)展现了最强的性能,但在困惑度上仍不及AR模型,并且缺少关键的推理效率特性——尤其是KV缓存。在本研究中,我们引入了Eso-LMs,这是一个融合了AR与MDM范式的新模型家族,能够在两者之间平滑地调整困惑度,同时克服各自的局限。Eso-LMs在标准语言建模基准测试中确立了新的技术标杆。尤为重要的是,我们**首次为MDMs引入了KV缓存**,同时保持了并行生成能力,显著提升了推理效率。结合优化的采样策略,我们的方法实现了比标准MDMs**快65倍**的推理速度,以及比先前的半自回归方法**快4倍**的表现。项目页面提供了代码和模型检查点:
[http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)
English
Diffusion-based language models offer a compelling alternative to
autoregressive (AR) models by enabling parallel and controllable generation.
Among this family of models, Masked Diffusion Models (MDMs) achieve the
strongest performance but still underperform AR models in perplexity and lack
key inference-time efficiency features--most notably, KV caching. In this work,
we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms,
enabling smooth interpolation between their perplexities while overcoming their
respective limitations. Eso-LMs set a new state of the art on standard language
modeling benchmarks. Crucially, we are the **first to introduce KV caching for
MDMs** while preserving parallel generation, significantly improving inference
efficiency. Combined with an optimized sampling schedule, our method achieves
up to **65x** faster inference than standard MDMs and **4x** faster inference
than prior semi-autoregressive approaches. We provide the code and model
checkpoints on the project page:
[http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)Summary
AI-Generated Summary