dVoting：面向分布式大语言模型的快速投票机制

摘要

扩散大语言模型（dLLMs）代表了超越自回归建模的新范式，在保持竞争力性能的同时，天然支持灵活的解码过程。具体而言，dLLMs能够并行生成任意位置的词元，这赋予了它们在测试时并行扩展方面的巨大潜力——而此前自回归建模因效率低下而严重受限。本研究提出dVoting技术，这是一种无需训练即可增强推理能力的快速投票方法，仅需付出可接受的计算开销。该技术的灵感来源于以下发现：对于同一提示的多个生成样本，大部分词元预测保持稳定，而模型性能实际上由少数存在跨样本差异的关键词元决定。借助dLLMs的任意位置生成能力，dVoting通过采样执行迭代优化：首先进行一致性分析识别不确定词元，随后通过投票机制重新生成这些词元，并循环该过程直至收敛。大量实验表明，dVoting在多个基准测试中均能稳定提升性能：GSM8K数据集提升6.22%-7.66%，MATH500提升4.40%-7.20%，ARC-C提升3.16%-14.84%，MMLU提升4.83%-5.74%。代码已开源：https://github.com/fscdc/dVoting

English

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting

dVoting：面向分布式大语言模型的快速投票机制

dVoting: Fast Voting for dLLMs

摘要

Support