DEER：基於擴散模型的草稿生成，結合自回歸模型的驗證機制

摘要

效率作為大型語言模型驅動的智能體與推理系統的關鍵實踐挑戰，正日益受到自迴歸解碼固有延遲的制約。推測解碼通過草案-驗證機制緩解此成本，但現有方法依賴自迴歸草案模型（即草案生成器），存在兩個根本性問題：（1）逐步累積的不確定性導致目標模型與草案生成器間的信任度遞減；（2）自迴歸草案器固有的序列化解碼特性。這些因素共同導致加速效果有限。本文提出擴散大型語言模型草案器能通過其根本不同的概率建模與高效並行解碼策略，自然克服上述缺陷。基於此洞見，我們推出DEER框架——採用擴散模型生成草案、自迴歸模型驗證的高效推測解碼方案。為實現高質量草案生成，DEER通過兩階段訓練流程對齊基於dLLM的草案器與目標自迴歸模型，並採用單步解碼策略生成長草案段落。實驗顯示DEER的草案接受長度達32個詞元，遠超EAGLE-3的10個詞元。在HumanEval基準測試中，搭配Qwen3-30B-A3B模型時，DEER實現5.54倍加速，而EAGLE-3僅達2.41倍。程式碼、模型及演示等資源將於https://czc726.github.io/DEER/公開。

English

Efficiency, as a critical practical challenge for LLM-driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) decoding. Speculative decoding mitigates this cost through a draft-verify scheme, yet existing approaches rely on AR draft models (a.k.a., drafters), which introduce two fundamental issues: (1) step-wise uncertainty accumulation leads to a progressive collapse of trust between the target model and the drafter, and (2) inherently sequential decoding of AR drafters. Together, these factors cause limited speedups. In this paper, we show that a diffusion large language model (dLLM) drafters can naturally overcome these issues through its fundamentally different probabilistic modeling and efficient parallel decoding strategy. Building on this insight, we introduce DEER, an efficient speculative decoding framework that drafts with diffusion and verifies with AR models. To enable high-quality drafting, DEER employs a two-stage training pipeline to align the dLLM-based drafters with the target AR model, and further adopts single-step decoding to generate long draft segments. Experiments show DEER reaches draft acceptance lengths of up to 32 tokens, far surpassing the 10 tokens achieved by EAGLE-3. Moreover, on HumanEval with Qwen3-30B-A3B, DEER attains a 5.54x speedup, while EAGLE-3 achieves only 2.41x. Code, model, demo, etc, will be available at https://czc726.github.io/DEER/

DEER：基於擴散模型的草稿生成，結合自回歸模型的驗證機制

DEER: Draft with Diffusion, Verify with Autoregressive Models

摘要

Support