dVoting : Vote rapide pour les dLLM

papers.abstract

Les modèles de langage à grande échelle par diffusion (dLLM) représentent un nouveau paradigme au-delà de la modélisation autorégressive, offrant des performances compétitives tout en permettant naturellement un processus de décodage flexible. Concrètement, les dLLM peuvent générer des tokens à des positions arbitraires en parallèle, leur conférant un potentiel significatif pour la mise à l'échelle parallèle au moment du test, qui était auparavant limitée par une inefficacité sévère dans la modélisation autorégressive. Dans ce travail, nous présentons dVoting, une technique de vote rapide qui améliore les capacités de raisonnement sans entraînement, avec seulement une surcharge computationnelle supplémentaire acceptable. dVoting est motivé par l'observation que, sur de multiples échantillons pour la même instruction, les prédictions de tokens restent largement cohérentes, alors que la performance est déterminée par un petit sous-ensemble de tokens présentant une variabilité inter-échantillons. En tirant parti de la capacité de génération à position arbitraire des dLLM, dVoting effectue un raffinement itératif par échantillonnage, identifie les tokens incertains via une analyse de cohérence, les régénère par vote, et répète ce processus jusqu'à convergence. Des évaluations approfondies démontrent que dVoting améliore constamment les performances sur divers benchmarks. Il obtient des gains de 6,22 % à 7,66 % sur GSM8K, 4,40 % à 7,20 % sur MATH500, 3,16 % à 14,84 % sur ARC-C et 4,83 % à 5,74 % sur MMLU. Notre code est disponible à l'adresse https://github.com/fscdc/dVoting.

English

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting

dVoting : Vote rapide pour les dLLM

dVoting: Fast Voting for dLLMs

papers.abstract

Support