内省的拡散言語モデル

要旨

拡散言語モデルは並列生成を可能にするが、品質面では依然として自己回帰モデルに劣っている。この差は内省的一貫性の欠如に起因すると我々は考える。自己回帰モデルは自身の生成結果と整合性が取れるのに対し、拡散言語モデルではしばしばそれが成り立たない。我々は、モデルが自身で過去に生成したトークンを受け入れるかどうかを測定する「内省的受容率」を定義した。この指標により、因果的マスキングとロジットシフトが暗黙的に内省的一貫性を強化するため、自己回帰学習が構造的優位性を持つ理由が明らかになった。この知見に基づき、我々は拡散スタイルの並列復号を維持しつつ自己回帰学習の内省的一貫性を継承する新しい枠組み「内省的拡散言語モデル（I-DLM）」を提案する。I-DLMは新たな内省的ストライド復号（ISD）アルゴリズムを採用し、単一のフォワードパスで既生成トークンの検証と新規トークンの生成を同時に実行可能とする。システム面では、自己回帰モデル由来の最適化技術を継承したI-DLM推論エンジンを構築し、さらに独自の定常バッチスケジューラを実装した。我々の知る限り、I-DLMは同一規模の自己回帰モデルと同等の品質を達成した初の拡散言語モデルであり、15のベンチマークで従来の拡散モデルをモデル品質と実用サービ効率の両面で上回った。具体的には、AIME-24で69.6、LiveCodeBench-v6で45.7を記録し、LLaMA-2.1-mini（16B）をそれぞれ26ポイント以上、15ポイント以上リードした。品質に加え、I-DLMは大規模同時接続需要の高まりに対応する設計となっており、従来の最先端拡散モデル比約3倍のスループットを実現している。

English

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.

内省的拡散言語モデル

Introspective Diffusion Language Models

要旨

Support