ParallelBench: 拡散型LLMにおける並列デコードのトレードオフの理解

要旨

ほとんどの自己回帰型大規模言語モデル（LLM）が逐次的なデコードに制約されている一方で、拡散型LLM（dLLM）は並列デコードを通じて推論を劇的に加速する可能性から、注目を集めています。しかし、この可能性にもかかわらず、dLLMにおける条件付き独立性の仮定は、並列デコードにおいてトークン間の依存関係を無視するため、これらの依存関係が強い場合に生成品質の低下を必然的に引き起こします。それにもかかわらず、既存の研究はこれらの本質的な課題をほとんど無視しており、標準的なベンチマーク（例：数学やコーディング）での評価は、並列デコードによる品質低下を十分に捉えることができていません。このギャップを埋めるため、我々はまず並列デコードの情報理論的な分析を提供します。次に、データ分布とデコード戦略の両方の観点から、解析的に扱いやすい合成リスト操作に関するケーススタディを行い、並列デコードの根本的な限界を浮き彫りにする定量的な洞察を提供します。これらの洞察に基づいて、我々はdLLM向けに特別に設計された最初のベンチマークであるParallelBenchを提案します。このベンチマークは、人間や自己回帰型LLMにとっては簡単であるが、並列デコード下のdLLMにとっては非常に困難な現実的なタスクを特徴としています。ParallelBenchを用いて、我々はdLLMと自己回帰型LLMを体系的に分析し、以下のことを明らかにしました：(i) 並列デコード下のdLLMは、現実世界のシナリオにおいて劇的な品質低下を引き起こす可能性があり、(ii) 現在の並列デコード戦略は、タスクの難易度に基づいて並列度を適応させることが難しく、品質を損なうことなく有意義な高速化を達成できていません。我々の知見は、現在の速度と品質のトレードオフを克服するための革新的なデコード手法の緊急の必要性を強調しています。我々は、真に効率的なdLLMの開発を加速するために、このベンチマークを公開します。

English

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.

ParallelBench: 拡散型LLMにおける並列デコードのトレードオフの理解

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

要旨

Support