連續自迴歸語言模型

摘要

大型語言模型（LLM）的效率從根本上受制於其序列化、逐詞元的生成過程。我們主張突破此瓶頸需要建立新的LLM擴展維度：提升每個生成步驟的語義頻寬。為此，我們提出連續自迴歸語言模型（CALM），實現從離散下一詞元預測到連續下一向量預測的典範轉移。CALM採用高保真自編碼器將K個詞元區塊壓縮為單一連續向量，並能以超過99.9%的準確率重建原始詞元。這使我們能將語言建模為連續向量序列而非離散詞元，從而將生成步驟數量減少K倍。此典範轉移需要新建模工具組，因此我們開發了完整的無似然框架，支持在連續域中進行穩健訓練、評估與可控採樣。實驗表明，CALM顯著優化了性能與計算量的權衡，能以大幅降低的計算成本達到強力離散基線模型的性能。更重要的是，這些發現確立了下一向量預測作為實現超高效率語言模型的強大可擴展路徑。程式碼：https://github.com/shaochenze/calm。專案頁面：https://shaochenze.github.io/blog/2025/CALM。

English

The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.