DEER：基于扩散模型生成草稿，采用自回归模型验证

摘要

效率作为大语言模型驱动智能体与推理系统的关键实践挑战，正日益受到自回归解码固有延迟的限制。推测式解码通过"草稿-验证"机制缓解这一成本，但现有方法依赖自回归草稿模型（即草案器），存在两个根本性问题：（1）逐步累积的不确定性导致目标模型与草案器间的信任度持续衰减；（2）自回归草案器固有的串行解码特性。这些因素共同导致加速效果有限。本文提出扩散大语言模型草案器能通过其根本不同的概率建模和高效并行解码策略自然克服上述问题。基于此，我们推出DEER框架——采用扩散模型生成草稿、自回归模型进行验证的高效推测式解码方案。为实现高质量草稿生成，DEER通过两阶段训练流程使基于扩散大语言模型的草案器与目标自回归模型对齐，并采用单步解码策略生成长段落草稿。实验表明DEER的草稿接受长度可达32个词元，远超EAGLE-3的10个词元。在HumanEval基准测试中，DEER配合Qwen3-30B-A3B实现5.54倍加速，而EAGLE-3仅达2.41倍。代码、模型及演示等资源详见https://czc726.github.io/DEER/

English

Efficiency, as a critical practical challenge for LLM-driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) decoding. Speculative decoding mitigates this cost through a draft-verify scheme, yet existing approaches rely on AR draft models (a.k.a., drafters), which introduce two fundamental issues: (1) step-wise uncertainty accumulation leads to a progressive collapse of trust between the target model and the drafter, and (2) inherently sequential decoding of AR drafters. Together, these factors cause limited speedups. In this paper, we show that a diffusion large language model (dLLM) drafters can naturally overcome these issues through its fundamentally different probabilistic modeling and efficient parallel decoding strategy. Building on this insight, we introduce DEER, an efficient speculative decoding framework that drafts with diffusion and verifies with AR models. To enable high-quality drafting, DEER employs a two-stage training pipeline to align the dLLM-based drafters with the target AR model, and further adopts single-step decoding to generate long draft segments. Experiments show DEER reaches draft acceptance lengths of up to 32 tokens, far surpassing the 10 tokens achieved by EAGLE-3. Moreover, on HumanEval with Qwen3-30B-A3B, DEER attains a 5.54x speedup, while EAGLE-3 achieves only 2.41x. Code, model, demo, etc, will be available at https://czc726.github.io/DEER/

DEER：基于扩散模型生成草稿，采用自回归模型验证

DEER: Draft with Diffusion, Verify with Autoregressive Models

摘要

Support