扩散强迫：下一个标记预测遇上全序列扩散

摘要

本文提出了扩散强制（Diffusion Forcing），这是一种新的训练范式，其中扩散模型被训练用于去噪一组具有独立每个标记噪声水平的标记。我们将扩散强制应用于序列生成建模，通过训练因果下一个标记预测模型来生成一个或多个未来标记，而无需完全扩散过去的标记。我们的方法被证明结合了下一个标记预测模型的优势，如可变长度生成，以及完整序列扩散模型的优势，如引导采样到理想轨迹的能力。我们的方法提供了一系列额外功能，例如（1）连续标记序列的展开，如视频，长度超过训练范围，基线发散，以及（2）新的采样和引导方案，这些方案从扩散强制的可变范围和因果架构中获益，从而在决策制定和规划任务中实现显著的性能提升。除了实证成功外，我们的方法被证明优化了一个变分下界，该下界涵盖了从真实联合分布中抽取的所有子标记序列的可能性。项目网站：https://boyuan.space/diffusion-forcing/

English

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories. Our method offers a range of additional capabilities, such as (1) rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and (2) new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks. In addition to its empirical success, our method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution. Project website: https://boyuan.space/diffusion-forcing/

扩散强迫：下一个标记预测遇上全序列扩散

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

摘要

Support