ParallelBench: Compreendendo as Compensações da Decodificação Paralela em LLMs de Difusão

Resumo

Enquanto a maioria dos LLMs autoregressivos está limitada à decodificação token por token, os LLMs de difusão (dLLMs) têm atraído crescente interesse por seu potencial de acelerar drasticamente a inferência por meio da decodificação paralela. Apesar dessa promessa, a suposição de independência condicional nos dLLMs faz com que a decodificação paralela ignore as dependências entre tokens, degradando inevitavelmente a qualidade da geração quando essas dependências são fortes. No entanto, trabalhos existentes em grande parte negligenciam esses desafios inerentes, e avaliações em benchmarks padrão (por exemplo, matemática e codificação) não são suficientes para capturar a degradação de qualidade causada pela decodificação paralela. Para abordar essa lacuna, primeiro fornecemos uma análise teórica da informação da decodificação paralela. Em seguida, conduzimos estudos de caso em operações sintéticas de listas analiticamente tratáveis, tanto da perspectiva da distribuição de dados quanto da estratégia de decodificação, oferecendo insights quantitativos que destacam as limitações fundamentais da decodificação paralela. Com base nesses insights, propomos o ParallelBench, o primeiro benchmark especificamente projetado para dLLMs, apresentando tarefas realistas que são triviais para humanos e LLMs autoregressivos, mas excepcionalmente desafiadoras para dLLMs sob decodificação paralela. Usando o ParallelBench, analisamos sistematicamente tanto dLLMs quanto LLMs autoregressivos, revelando que: (i) dLLMs sob decodificação paralela podem sofrer uma degradação dramática de qualidade em cenários do mundo real, e (ii) as estratégias atuais de decodificação paralela lutam para adaptar seu grau de paralelismo com base na dificuldade da tarefa, falhando assim em alcançar aceleração significativa sem comprometer a qualidade. Nossas descobertas destacam a necessidade urgente de métodos inovadores de decodificação que possam superar o atual trade-off entre velocidade e qualidade. Disponibilizamos nosso benchmark para ajudar a acelerar o desenvolvimento de dLLMs verdadeiramente eficientes.

English

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.

ParallelBench: Compreendendo as Compensações da Decodificação Paralela em LLMs de Difusão

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

Resumo

Support