연속 자기회귀 언어 모델

초록

대규모 언어 모델(LLM)의 효율성은 근본적으로 토큰 단위의 순차적 생성 과정에 의해 제한됩니다. 본 연구에서는 이러한 병목 현상을 극복하기 위해 LLM 확장의 새로운 설계 축, 즉 생성 단계별 의미론적 대역폭(semantic bandwidth) 증대가 필요하다고 주장합니다. 이를 위해 우리는 이산적 다음 토큰 예측에서 연속적 다음 벡터 예측으로의 패러다임 전환을 이루는 CALM(Continuous Autoregressive Language Models)을 제안합니다. CALM은 고해상도 오토인코더를 사용하여 K개의 토큰 덩어리를 단일 연속 벡터로 압축하며, 원본 토큰을 99.9% 이상의 정확도로 복원할 수 있습니다. 이를 통해 언어를 이산적 토큰의 연속이 아닌 연속 벡터의 시퀀스로 모델링하여 생성 단계 수를 K배 줄일 수 있습니다. 이러한 패러다임 전환은 새로운 모델링 도구 키트를 필요로 하므로, 우리는 연속 영역에서 강건한 학습, 평가 및 제어 가능한 샘플링을 가능하게 하는 포괄적인 가능도 무관(likelihood-free) 프레임워크를 개발했습니다. 실험 결과, CALM은 성능-계산 절충 관계를 크게 개선하여 강력한 이산 기준 모델들의 성능을 훨씬 낮은 계산 비용으로 달성함을 보여줍니다. 더욱 중요한 것은, 이러한 결과가 다음 벡터 예측이 초고효율 언어 모델을 위한 강력하고 확장 가능한 경로임을 입증한다는 점입니다. 코드: https://github.com/shaochenze/calm. 프로젝트: https://shaochenze.github.io/blog/2025/CALM.

English

The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.