dVoting：面向分布式大语言模型的快速投票机制

摘要

扩散大语言模型(dLLMs)代表了超越自回归建模的新范式，在保持竞争力性能的同时，天然支持灵活的解码过程。具体而言，dLLMs能够并行生成任意位置的标记，这为其带来了显著的并行测试时扩展潜力，而该潜力此前受限于自回归建模的严重低效性。本文提出dVoting技术，这是一种无需训练即可增强推理能力的快速投票方法，仅需可接受的计算开销。dVoting的提出基于以下观察：对于同一提示的多个生成样本，大部分标记预测保持稳定，而模型性能实际上由少数存在跨样本波动的关键标记决定。借助dLLMs的任意位置生成能力，dVoting通过采样、一致性分析识别不确定标记、投票重生成等步骤进行迭代优化，直至收敛。大量实验表明，dVoting在多个基准测试中持续提升性能：GSM8K提升6.22%-7.66%，MATH500提升4.40%-7.20%，ARC-C提升3.16%-14.84%，MMLU提升4.83%-5.74%。代码已开源：https://github.com/fscdc/dVoting

English

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting