ParallelBench:理解擴散式大語言模型中並行解碼的權衡取捨
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
October 6, 2025
作者: Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjae Lee, Yuchen Zeng, Shuibai Zhang, Coleman Hooper, Yuezhou Hu, Hyung Il Koo, Nam Ik Cho, Kangwook Lee
cs.AI
摘要
儘管大多數自回歸大型語言模型(LLMs)受限於逐個解碼的方式,擴散式大型語言模型(dLLMs)因其通過並行解碼大幅加速推理的潛力而日益受到關注。儘管前景看好,dLLMs中的條件獨立性假設導致並行解碼忽略了詞元間的依賴關係,當這些依賴關係強烈時,不可避免地會降低生成質量。然而,現有研究大多忽視了這些固有挑戰,且在標準基準測試(如數學和編碼)上的評估不足以捕捉並行解碼導致的質量下降。為填補這一空白,我們首先對並行解碼進行了信息理論分析。接著,我們從數據分佈和解碼策略的角度,對可解析的合成列表操作進行了案例研究,提供了定量見解,凸顯了並行解碼的根本限制。基於這些見解,我們提出了ParallelBench,這是首個專為dLLMs設計的基準測試,包含對人類和自回歸LLMs來說輕而易舉但對並行解碼下的dLLMs極具挑戰性的現實任務。通過ParallelBench,我們系統地分析了dLLMs和自回歸LLMs,發現:(i)在現實場景中,並行解碼下的dLLMs可能遭受顯著的質量下降;(ii)當前的並行解碼策略難以根據任務難度調整其並行度,因而無法在不犧牲質量的情況下實現有意義的加速。我們的研究結果強調了迫切需要創新的解碼方法來克服當前速度與質量的權衡。我們公開了我們的基準測試,以幫助加速真正高效的dLLMs的發展。
English
While most autoregressive LLMs are constrained to one-by-one decoding,
diffusion LLMs (dLLMs) have attracted growing interest for their potential to
dramatically accelerate inference through parallel decoding. Despite this
promise, the conditional independence assumption in dLLMs causes parallel
decoding to ignore token dependencies, inevitably degrading generation quality
when these dependencies are strong. However, existing works largely overlook
these inherent challenges, and evaluations on standard benchmarks (e.g., math
and coding) are not sufficient to capture the quality degradation caused by
parallel decoding. To address this gap, we first provide an
information-theoretic analysis of parallel decoding. We then conduct case
studies on analytically tractable synthetic list operations from both data
distribution and decoding strategy perspectives, offering quantitative insights
that highlight the fundamental limitations of parallel decoding. Building on
these insights, we propose ParallelBench, the first benchmark specifically
designed for dLLMs, featuring realistic tasks that are trivial for humans and
autoregressive LLMs yet exceptionally challenging for dLLMs under parallel
decoding. Using ParallelBench, we systematically analyze both dLLMs and
autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can
suffer dramatic quality degradation in real-world scenarios, and (ii) current
parallel decoding strategies struggle to adapt their degree of parallelism
based on task difficulty, thus failing to achieve meaningful speedup without
compromising quality. Our findings underscore the pressing need for innovative
decoding methods that can overcome the current speed-quality trade-off. We
release our benchmark to help accelerate the development of truly efficient
dLLMs.