區塊擴散:自回歸與擴散語言模型之間的插值
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
March 12, 2025
作者: Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov
cs.AI
摘要
擴散語言模型相較於自回歸模型具有獨特優勢,因其具備並行生成與可控性的潛力,然而在概率建模方面表現稍遜,且僅限於固定長度的生成。本研究引入了一類塊擴散語言模型,該模型在離散去噪擴散與自回歸模型之間進行了折衷。塊擴散通過支持靈活長度生成,並利用KV緩存與並行令牌採樣提升推理效率,克服了兩種方法的關鍵限制。我們提出了一套構建高效塊擴散模型的方案,包括高效的訓練算法、梯度方差估計器,以及數據驅動的噪聲調度以最小化方差。塊擴散在語言建模基準測試中為擴散模型樹立了新的性能標杆,並實現了任意長度序列的生成。我們在項目頁面提供了代碼、模型權重及博客文章:https://m-arriola.com/bd3lms/。
English
Diffusion language models offer unique benefits over autoregressive models
due to their potential for parallelized generation and controllability, yet
they lag in likelihood modeling and are limited to fixed-length generation. In
this work, we introduce a class of block diffusion language models that
interpolate between discrete denoising diffusion and autoregressive models.
Block diffusion overcomes key limitations of both approaches by supporting
flexible-length generation and improving inference efficiency with KV caching
and parallel token sampling. We propose a recipe for building effective block
diffusion models that includes an efficient training algorithm, estimators of
gradient variance, and data-driven noise schedules to minimize the variance.
Block diffusion sets a new state-of-the-art performance among diffusion models
on language modeling benchmarks and enables generation of arbitrary-length
sequences. We provide the code, along with the model weights and blog post on
the project page: https://m-arriola.com/bd3lms/Summary
AI-Generated Summary