扩散强迫:下一个标记预测遇上全序列扩散
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
July 1, 2024
作者: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann
cs.AI
摘要
本文提出了扩散强制(Diffusion Forcing),这是一种新的训练范式,其中扩散模型被训练用于去噪一组具有独立每个标记噪声水平的标记。我们将扩散强制应用于序列生成建模,通过训练因果下一个标记预测模型来生成一个或多个未来标记,而无需完全扩散过去的标记。我们的方法被证明结合了下一个标记预测模型的优势,如可变长度生成,以及完整序列扩散模型的优势,如引导采样到理想轨迹的能力。我们的方法提供了一系列额外功能,例如(1)连续标记序列的展开,如视频,长度超过训练范围,基线发散,以及(2)新的采样和引导方案,这些方案从扩散强制的可变范围和因果架构中获益,从而在决策制定和规划任务中实现显著的性能提升。除了实证成功外,我们的方法被证明优化了一个变分下界,该下界涵盖了从真实联合分布中抽取的所有子标记序列的可能性。项目网站:https://boyuan.space/diffusion-forcing/
English
This paper presents Diffusion Forcing, a new training paradigm where a
diffusion model is trained to denoise a set of tokens with independent
per-token noise levels. We apply Diffusion Forcing to sequence generative
modeling by training a causal next-token prediction model to generate one or
several future tokens without fully diffusing past ones. Our approach is shown
to combine the strengths of next-token prediction models, such as
variable-length generation, with the strengths of full-sequence diffusion
models, such as the ability to guide sampling to desirable trajectories. Our
method offers a range of additional capabilities, such as (1) rolling-out
sequences of continuous tokens, such as video, with lengths past the training
horizon, where baselines diverge and (2) new sampling and guiding schemes that
uniquely profit from Diffusion Forcing's variable-horizon and causal
architecture, and which lead to marked performance gains in decision-making and
planning tasks. In addition to its empirical success, our method is proven to
optimize a variational lower bound on the likelihoods of all subsequences of
tokens drawn from the true joint distribution. Project website:
https://boyuan.space/diffusion-forcing/Summary
AI-Generated Summary