ChatPaper.aiChatPaper

计划扩散

Planned Diffusion

October 20, 2025
作者: Daniel Israel, Tian Jin, Ellie Cheng, Guy Van den Broeck, Aditya Grover, Suvinay Subramanian, Michael Carbin
cs.AI

摘要

大语言模型推理中的一个核心挑战在于生成速度与输出质量之间的权衡。自回归模型能够生成高质量文本,但需要逐个顺序生成token。扩散模型虽可并行生成token,却往往需要多次迭代才能达到同等质量。我们提出了一种混合方法——规划扩散,它结合了两种范式的优势。规划扩散分两阶段工作:首先,模型创建一个简短的自回归计划,将输出分解为更小、独立的片段;其次,模型利用扩散方法同时生成这些片段。这一方法拓展了速度-质量的帕累托前沿,为更快、高质量的文本生成提供了实用路径。在包含805条指令跟随提示的AlpacaEval测试集上,规划扩散实现了质量与延迟之间的帕累托最优权衡,相较于自回归生成,速度提升了1.27倍至1.81倍,而胜率仅分别下降了0.87%至5.4%。我们的敏感性分析表明,规划扩散的规划机制简洁可靠,且存在简单的运行时调节机制,可灵活控制质量与延迟的权衡。
English
A central challenge in large language model inference is the trade-off between generation speed and output quality. Autoregressive models produce high-quality text but generate tokens sequentially. Diffusion models can generate tokens in parallel but often need many iterations to match the same quality. We propose planned diffusion, a hybrid method that combines the strengths of both paradigms. Planned diffusion works in two stages: first, the model creates a short autoregressive plan that breaks the output into smaller, independent spans. Second, the model generates these spans simultaneously using diffusion. This approach expands the speed-quality Pareto frontier and provides a practical path to faster, high-quality text generation. On AlpacaEval, a suite of 805 instruction-following prompts, planned diffusion achieves Pareto-optimal trade-off between quality and latency, achieving 1.27x to 1.81x speedup over autoregressive generation with only 0.87\% to 5.4\% drop in win rate, respectively. Our sensitivity analysis shows that the planning mechanism of planned diffusion is minimal and reliable, and simple runtime knobs exist to provide flexible control of the quality-latency trade-off.
PDF22October 22, 2025