連続自己回帰言語モデル

要旨

大規模言語モデル（LLM）の効率性は、その逐次的でトークン単位の生成プロセスによって根本的に制限されている。我々は、このボトルネックを克服するには、生成ステップごとの意味的帯域幅を増大させるという新たなLLMスケーリングの設計軸が必要であると主張する。この目的に向けて、離散的な次トークン予測から連続的な次ベクトル予測へのパラダイムシフトとなる、Continuous Autoregressive Language Models（CALM）を提案する。CALMは、高精度なオートエンコーダを用いてK個のトークンの塊を単一の連続ベクトルに圧縮し、元のトークンを99.9%以上の精度で復元可能にする。これにより、言語を離散トークンの列ではなく連続ベクトルの列としてモデル化でき、生成ステップ数をK分の1に削減する。このパラダイムシフトには新たなモデリング手法が不可欠であるため、連続領域における堅牢な学習、評価、制御可能なサンプリングを可能にする包括的な尤度不要フレームワークを開発した。実験により、CALMが性能と計算コアのトレードオフを大幅に改善し、強力な離散ベースラインモデルと同等の性能をはるかに低い計算コストで達成できることを示す。さらに重要なことは、これらの知見が次ベクトル予測を、超高効率言語モデルへの強力かつスケーラブルな道筋として確立した点である。コード：https://github.com/shaochenze/calm。プロジェクト：https://shaochenze.github.io/blog/2025/CALM。

English

The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.