내성적 확산 언어 모델

초록

확산 언어 모델은 병렬 생성을 가능하게 하지만 여전히 생성 품질 면에서는 자기회귀 모델에 뒤처집니다. 우리는 이러한 격차가 내성적 일관성의 부재에서 비롯된다고 분석합니다. 자기회귀 모델은 자신이 생성한 결과와 일관성을 유지하는 반면, 확산 언어 모델은 종종 그러하지 못합니다. 우리는 모델이 이전에 생성한 토큰을 수용하는지 측정하는 내성적 수용율을 정의합니다. 이를 통해 자기회귀 훈련이 가지는 구조적 이점이 밝혀집니다: 인과적 마스킹과 로짓 이동이 내성적 일관성을 암묵적으로 강화하기 때문입니다. 이러한 관찰에 기반하여 우리는 확산 스타일의 병렬 디코딩을 유지하면서 자기회귀 훈련의 내성적 일관성을 계승한 패러다임인 내성적 확산 언어 모델(I-DLM)을 제안합니다. I-DLM은 새로운 토큰을 생성하면서 동시에 이전에 생성된 토큰을 검증할 수 있는 새로운 내성적 스트라이드 디코딩 알고리즘을 사용합니다. 시스템 관점에서, 우리는 I-DLM 추론 엔진을 자기회귀 모델의 최적화 기법을 계승하여 구축하고, 정적 배치 스케줄러를 통해 추가로 맞춤화했습니다. 우리가 아는 한, I-DLM은 동일 규모의 자기회귀 모델과 동등한 품질에 도달하면서 모델 품질과 실제 서비스 효율성 모두에서 기존 확산 언어 모델을 능가하는 첫 번째 확산 언어 모델입니다. 15개 벤치마크에서 AIME-24에서 69.6점, LiveCodeBench-v6에서 45.7점을 기록하여 LLaMA-2.1-mini(16B)를 각각 26점, 15점 이상 앞섰습니다. 품질을 넘어, I-DLM은 대규모 동시 처리 서비스에 대한 수요 증가에 대응하여 기존 최첨단 확산 언어 모델 대비 약 3배 높은 처리량을 제공합니다.

English

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.

내성적 확산 언어 모델

Introspective Diffusion Language Models

초록

Support