基於進度感知置信度調度的快速解碼擴散語言模型

摘要

擴散式大型語言模型（dLLMs）為自迴歸模型提供了一種極具前景的替代方案，但其緩慢的迭代取樣過程嚴重限制了實際應用價值。我們提出SchED——一種無需重新訓練、與模型無關的早退算法，該算法通過聚合全跨度對數邊際值，並在達到平滑的進度相關置信度閾值時停止解碼。我們在兩類dLLM模型（Dream與LLaDA）的基礎版本與指令微調版本上，針對涵蓋多項下游任務的十個基準測試（包括選擇題問答、數學運算、長問答/摘要及翻譯）進行評估。SchED實現了顯著且穩定的加速效果：在指令微調模型上平均獲得3.8-4.0倍加速，同時保持99.8-100%的基準性能；在基礎模型上則以99.1-100%的性能保留率實現穩定加速，在更激進的設定下最高可達2.34倍。採用對質量損失施加嚴苛懲罰的保守加速指標（QPS, γ=4）時，SchED展現出優異的魯棒性，明顯優於先前基於置信度的早退方法（後者在長文本生成任務中失效）。對模型標記預測的熵值分析顯示，指令微調會加速預測熵的衰減過程。通過將真實的置信度穩定轉化為計算效率提升，SchED顯著優化了dLLM的解碼效率。

English

Diffusion large language models (dLLMs) offer a promising alternative to autoregressive models, but their practical utility is severely hampered by slow, iterative sampling. We present SchED, a training-free, model-agnostic early-exit algorithm that aggregates full-span logit margins and halts decoding once a smooth, progress-dependent confidence threshold is met. We evaluated SchED on two dLLM families (Dream and LLaDA), in base and instruction-tuned variants across ten benchmarks spanning downstream tasks including multiple-choice question answering (MCQ), math, long-form QA/summarization, and translation. SchED delivers large, stable accelerations: on instruction-tuned models, it achieves 3.8-4.0times speedups while retaining 99.8-100% of the baseline score on average. On base models, SchED yields consistent speedup gains with 99.1-100% performance retention, with up to 2.34times under more aggressive settings. Using a conservative speed metric that heavily penalizes quality loss (QPS, γ{=}4), we show that SchED is robust and clearly outperforms prior confidence-based early-exit methods, which break down on long-form generation. An entropy analysis of the model's token predictions reveals that instruction tuning speeds up the decay of predictive entropy. By turning genuine confidence stabilization into computational savings, SchED makes dLLM decoding substantially more efficient.

基於進度感知置信度調度的快速解碼擴散語言模型

Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

摘要

Support