ChatPaper.aiChatPaper

顺序优势:逆熵投票在同等计算量下超越并行自洽性

The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute

November 4, 2025
作者: Aman Sharma, Paras Chopra
cs.AI

摘要

我们重新审视语言模型推理的测试时扩展策略,并探讨一个根本性问题:在同等令牌预算和计算资源下,究竟是运行多个独立并行链更优,还是运行较少链但通过连续步骤迭代优化更佳?通过对5个前沿开源模型和3个具有挑战性的推理基准进行全面评估,我们发现采用显式构建于先前尝试基础上的连续扩展策略,在95.6%的配置中持续优于主流的并行自洽范式,准确率最高提升46.7%。此外,我们提出逆熵加权投票法——一种无需训练的新方法,可进一步提升连续扩展的准确率。该方法通过按推理链逆熵比例加权答案,使我们的成功率超越并行多数表决法,确立了其作为最优测试时扩展策略的地位。这些发现从根本上挑战了自Wang等人(2022)提出自洽解码以来主导测试时扩展的并行推理范式,将连续优化定位为现代大语言模型推理的稳健默认方案,亟需推动推理时优化方法的范式转变。
English
We revisit test-time scaling for language model reasoning and ask a fundamental question: at equal token budget and compute, is it better to run multiple independent chains in parallel, or to run fewer chains that iteratively refine through sequential steps? Through comprehensive evaluation across 5 state-of-the-art open source models and 3 challenging reasoning benchmarks, we find that sequential scaling where chains explicitly build upon previous attempts consistently outperforms the dominant parallel self-consistency paradigm in 95.6% of configurations with gains in accuracy upto 46.7%. Further, we introduce inverse-entropy weighted voting, a novel training-free method to further boost the accuracy of sequential scaling. By weighing answers in proportion to the inverse entropy of their reasoning chains, we increase our success rate over parallel majority and establish it as the optimal test-time scaling strategy. Our findings fundamentally challenge the parallel reasoning orthodoxy that has dominated test-time scaling since Wang et al.'s self-consistency decoding (Wang et al., 2022), positioning sequential refinement as the robust default for modern LLM reasoning and necessitating a paradigm shift in how we approach inference-time optimization.
PDF42December 1, 2025