擴散強制:下一個標記預測遇上完整序列擴散
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
July 1, 2024
作者: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann
cs.AI
摘要
本文提出了擴散強制(Diffusion Forcing),一種新的訓練範式,其中訓練一個擴散模型以去噪一組具有獨立每令牌噪聲水平的令牌。我們將擴散強制應用於序列生成建模,通過訓練一個因果下一令牌預測模型來生成一個或多個未來令牌,而無需完全擴散過去的令牌。我們的方法被證明結合了下一令牌預測模型的優勢,如可變長度生成,以及完整序列擴散模型的優勢,如引導採樣到理想軌跡的能力。我們的方法提供了一系列額外功能,例如(1)連續令牌序列的展開,例如視頻,其長度超出訓練視野,基準線發散,以及(2)新的採樣和引導方案,獨特地從擴散強制的可變視野和因果架構中獲益,並在決策和規劃任務中帶來明顯的性能提升。除了實證成功外,我們的方法被證明優化了對真實聯合分佈中抽取的所有子令牌的可能性的變分下限。項目網站:https://boyuan.space/diffusion-forcing/
English
This paper presents Diffusion Forcing, a new training paradigm where a
diffusion model is trained to denoise a set of tokens with independent
per-token noise levels. We apply Diffusion Forcing to sequence generative
modeling by training a causal next-token prediction model to generate one or
several future tokens without fully diffusing past ones. Our approach is shown
to combine the strengths of next-token prediction models, such as
variable-length generation, with the strengths of full-sequence diffusion
models, such as the ability to guide sampling to desirable trajectories. Our
method offers a range of additional capabilities, such as (1) rolling-out
sequences of continuous tokens, such as video, with lengths past the training
horizon, where baselines diverge and (2) new sampling and guiding schemes that
uniquely profit from Diffusion Forcing's variable-horizon and causal
architecture, and which lead to marked performance gains in decision-making and
planning tasks. In addition to its empirical success, our method is proven to
optimize a variational lower bound on the likelihoods of all subsequences of
tokens drawn from the true joint distribution. Project website:
https://boyuan.space/diffusion-forcing/Summary
AI-Generated Summary