ParallelBench：理解擴散式大語言模型中並行解碼的權衡取捨

摘要

儘管大多數自回歸大型語言模型（LLMs）受限於逐個解碼的方式，擴散式大型語言模型（dLLMs）因其通過並行解碼大幅加速推理的潛力而日益受到關注。儘管前景看好，dLLMs中的條件獨立性假設導致並行解碼忽略了詞元間的依賴關係，當這些依賴關係強烈時，不可避免地會降低生成質量。然而，現有研究大多忽視了這些固有挑戰，且在標準基準測試（如數學和編碼）上的評估不足以捕捉並行解碼導致的質量下降。為填補這一空白，我們首先對並行解碼進行了信息理論分析。接著，我們從數據分佈和解碼策略的角度，對可解析的合成列表操作進行了案例研究，提供了定量見解，凸顯了並行解碼的根本限制。基於這些見解，我們提出了ParallelBench，這是首個專為dLLMs設計的基準測試，包含對人類和自回歸LLMs來說輕而易舉但對並行解碼下的dLLMs極具挑戰性的現實任務。通過ParallelBench，我們系統地分析了dLLMs和自回歸LLMs，發現：（i）在現實場景中，並行解碼下的dLLMs可能遭受顯著的質量下降；（ii）當前的並行解碼策略難以根據任務難度調整其並行度，因而無法在不犧牲質量的情況下實現有意義的加速。我們的研究結果強調了迫切需要創新的解碼方法來克服當前速度與質量的權衡。我們公開了我們的基準測試，以幫助加速真正高效的dLLMs的發展。

English

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.

ParallelBench：理解擴散式大語言模型中並行解碼的權衡取捨

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

摘要

Support