語言建模的連續擴散模型
Continuous Diffusion Model for Language Modeling
February 17, 2025
作者: Jaehyeong Jo, Sung Ju Hwang
cs.AI
摘要
擴散模型已成為建模離散類別數據時,自回歸模型的一種有前景的替代方案。然而,直接在離散數據空間上運作的擴散模型並未充分利用迭代精煉的優勢,因為信號在離散狀態間的轉換過程中會丟失。現有的針對離散數據的連續擴散模型與離散方法相比性能有限,且兩者之間不明確的聯繫限制了離散數據擴散模型的發展。在本研究中,我們提出了一種用於語言建模的連續擴散模型,該模型融合了底層類別分佈的幾何特性。我們建立了離散擴散與統計流形上連續流動之間的聯繫,並基於此類比,引入了一種簡化的擴散過程設計,該設計推廣了先前的離散擴散模型。我們進一步提出了一種基於徑向對稱性的無模擬訓練框架,以及一種應對流形高維度的簡單技術。在語言建模基準測試及其他模態上的全面實驗表明,我們的方法超越了現有的離散擴散模型,並接近自回歸模型的性能。代碼可在https://github.com/harryjo97/RDLM{https://github.com/harryjo97/RDLM}獲取。
English
Diffusion models have emerged as a promising alternative to autoregressive
models in modeling discrete categorical data. Yet diffusion models that
directly work on discrete data space do not fully exploit the power of
iterative refinement, as the signals are lost during the transition between
discrete states. Existing continuous diffusion models for discrete data have
limited performance compared to discrete approaches, and the unclear link
between them restricts the development of diffusion models for discrete data.
In this work, we propose a continuous diffusion model for language modeling
that incorporates the geometry of the underlying categorical distribution. We
establish a connection between the discrete diffusion and continuous flow on
the statistical manifold, and building on the analogy, we introduce a simple
design for the diffusion process that generalizes previous discrete diffusion
models. We further propose a simulation-free training framework based on radial
symmetry and a simple technique to address the high dimensionality of the
manifold. Comprehensive experiments on language modeling benchmarks and other
modalities show that our method outperforms existing discrete diffusion models
and approaches the performance of autoregressive models. Codes available at
https://github.com/harryjo97/RDLM{https://github.com/harryjo97/RDLM}.Summary
AI-Generated Summary