ChatPaper.aiChatPaper

深奥语言模型

Esoteric Language Models

June 2, 2025
作者: Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
cs.AI

摘要

基于扩散的语言模型提供了一种引人注目的替代方案,相较于自回归(AR)模型,它能够实现并行且可控的生成。在这一系列模型中,掩码扩散模型(MDMs)展现了最强的性能,但在困惑度上仍不及AR模型,并且缺乏关键推理效率特性——尤其是KV缓存。在本研究中,我们引入了Eso-LMs,这是一个融合了AR与MDM范式的新模型家族,能够在两者之间实现困惑度的平滑过渡,同时克服各自的局限。Eso-LMs在标准语言建模基准测试中确立了新的技术前沿。尤为关键的是,我们**首次为MDMs引入了KV缓存**,同时保留了并行生成能力,显著提升了推理效率。结合优化的采样策略,我们的方法相较于标准MDMs实现了高达**65倍**的推理加速,相较于先前的半自回归方法也达到了**4倍**的加速。项目页面提供了代码及模型检查点:[http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)
English
Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features--most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Eso-LMs set a new state of the art on standard language modeling benchmarks. Crucially, we are the **first to introduce KV caching for MDMs** while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves up to **65x** faster inference than standard MDMs and **4x** faster inference than prior semi-autoregressive approaches. We provide the code and model checkpoints on the project page: [http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)
PDF72June 3, 2025