ChatPaper.aiChatPaper

ParallelBench:探究扩散式大语言模型中并行解码的权衡机制

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

October 6, 2025
作者: Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjae Lee, Yuchen Zeng, Shuibai Zhang, Coleman Hooper, Yuezhou Hu, Hyung Il Koo, Nam Ik Cho, Kangwook Lee
cs.AI

摘要

尽管大多数自回归大语言模型(LLMs)受限于逐字解码,扩散大语言模型(dLLMs)因其通过并行解码显著加速推理的潜力而日益受到关注。然而,尽管前景广阔,dLLMs中的条件独立性假设使得并行解码忽视了词元间的依赖关系,当这些依赖关系较强时,不可避免地会降低生成质量。现有研究大多忽视了这些固有挑战,且在标准基准(如数学和编码)上的评估不足以捕捉并行解码导致的质量下降。为填补这一空白,我们首先从信息论角度对并行解码进行了分析。随后,我们从数据分布和解码策略两个视角,对可解析的合成列表操作进行了案例研究,提供了量化见解,揭示了并行解码的根本局限性。基于这些见解,我们提出了ParallelBench,这是首个专为dLLMs设计的基准测试,包含对人类和自回归LLMs而言简单但对并行解码下的dLLMs极具挑战性的现实任务。利用ParallelBench,我们系统分析了dLLMs和自回归LLMs,发现:(i)并行解码下的dLLMs在现实场景中可能遭受显著的质量下降;(ii)当前的并行解码策略难以根据任务难度调整其并行度,因而在保证质量的前提下难以实现有意义的加速。我们的发现强调了迫切需要创新的解码方法,以克服当前速度与质量之间的权衡。我们公开了我们的基准测试,以助力真正高效dLLMs的发展。
English
While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.
PDF262October 16, 2025