ブロック拡散：自己回帰モデルと拡散言語モデルの間の補間

要旨

拡散言語モデルは、並列生成と制御性の可能性から自己回帰モデルに対して独自の利点を提供しますが、尤度モデリングにおいては遅れをとり、固定長の生成に制限されています。本研究では、離散的なノイズ除去拡散モデルと自己回帰モデルの間を補間するブロック拡散言語モデルのクラスを紹介します。ブロック拡散は、柔軟な長さの生成をサポートし、KVキャッシュと並列トークンサンプリングによる推論効率を向上させることで、両アプローチの主要な制限を克服します。我々は、効率的な訓練アルゴリズム、勾配分散の推定器、分散を最小化するデータ駆動型ノイズスケジュールを含む、効果的なブロック拡散モデルを構築するためのレシピを提案します。ブロック拡散は、言語モデリングベンチマークにおいて拡散モデルの中で新たな最先端の性能を確立し、任意の長さのシーケンスの生成を可能にします。プロジェクトページ（https://m-arriola.com/bd3lms/）にて、コード、モデル重み、ブログ記事を提供しています。

English

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms/

ブロック拡散：自己回帰モデルと拡散言語モデルの間の補間

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

要旨

Support