连续自回归语言模型
Continuous Autoregressive Language Models
October 31, 2025
作者: Chenze Shao, Darren Li, Fandong Meng, Jie Zhou
cs.AI
摘要
大型语言模型(LLM)的效率从根本上受限于其逐词元的顺序生成过程。我们认为,突破这一瓶颈需要为LLM扩展引入新的设计维度:提升每一步生成过程的语义带宽。为此,我们提出连续自回归语言模型(CALM),实现了从离散下一词元预测到连续下一向量预测的范式转变。CALM采用高保真自编码器将包含K个词元的文本块压缩为单个连续向量,并能以超过99.9%的准确率重建原始词元。这使得我们可以将语言建模为连续向量序列而非离散词元序列,从而将生成步骤数量减少至原来的1/K。这一范式转变需要新的建模工具,因此我们开发了完整的无似然框架,支持在连续域中进行稳健的训练、评估和可控采样。实验表明,CALM显著优化了性能与计算量的权衡关系,在显著降低计算成本的同时达到了强离散基线的性能水平。更重要的是,这些发现确立了下一向量预测作为实现超高效语言模型的一条强大且可扩展的技术路径。代码:https://github.com/shaochenze/calm。项目主页:https://shaochenze.github.io/blog/2025/CALM。
English
The efficiency of large language models (LLMs) is fundamentally limited by
their sequential, token-by-token generation process. We argue that overcoming
this bottleneck requires a new design axis for LLM scaling: increasing the
semantic bandwidth of each generative step. To this end, we introduce
Continuous Autoregressive Language Models (CALM), a paradigm shift from
discrete next-token prediction to continuous next-vector prediction. CALM uses
a high-fidelity autoencoder to compress a chunk of K tokens into a single
continuous vector, from which the original tokens can be reconstructed with
over 99.9\% accuracy. This allows us to model language as a sequence of
continuous vectors instead of discrete tokens, which reduces the number of
generative steps by a factor of K. The paradigm shift necessitates a new
modeling toolkit; therefore, we develop a comprehensive likelihood-free
framework that enables robust training, evaluation, and controllable sampling
in the continuous domain. Experiments show that CALM significantly improves the
performance-compute trade-off, achieving the performance of strong discrete
baselines at a significantly lower computational cost. More importantly, these
findings establish next-vector prediction as a powerful and scalable pathway
towards ultra-efficient language models. Code:
https://github.com/shaochenze/calm. Project:
https://shaochenze.github.io/blog/2025/CALM.