顺序优势:逆熵投票在同等计算量下超越并行自洽性
The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute
November 4, 2025
作者: Aman Sharma, Paras Chopra
cs.AI
摘要
我们重新审视语言模型推理的测试时扩展策略,并探讨一个根本性问题:在同等令牌预算和计算资源下,究竟是运行多个独立并行链更优,还是运行较少链但通过序列化步骤迭代优化更佳?通过对5个前沿开源模型和3个具有挑战性的推理基准进行全面评估,我们发现采用显式基于先前尝试的序列化扩展策略,在95.6%的配置中持续优于主流的并行自洽解码范式,准确率最高提升46.7%。此外,我们提出逆熵加权投票法——一种无需训练的新方法,可进一步提升序列化扩展的准确率。该方法通过按推理链逆熵比例加权答案,在超越并行多数投票法的基础上将成功率进一步提升,确立了其作为最优测试时扩展策略的地位。我们的研究结果从根本上挑战了自Wang等人提出自洽解码以来主导测试时扩展的并行推理范式,将序列优化定位为现代大语言模型推理的稳健默认方案,这要求我们在推理时优化方法上进行范式转变。
English
We revisit test-time scaling for language model reasoning and ask a
fundamental question: at equal token budget and compute, is it better to run
multiple independent chains in parallel, or to run fewer chains that
iteratively refine through sequential steps? Through comprehensive evaluation
across 5 state-of-the-art open source models and 3 challenging reasoning
benchmarks, we find that sequential scaling where chains explicitly build upon
previous attempts consistently outperforms the dominant parallel
self-consistency paradigm in 95.6% of configurations with gains in accuracy
upto 46.7%. Further, we introduce inverse-entropy weighted voting, a novel
training-free method to further boost the accuracy of sequential scaling. By
weighing answers in proportion to the inverse entropy of their reasoning
chains, we increase our success rate over parallel majority and establish it as
the optimal test-time scaling strategy. Our findings fundamentally challenge
the parallel reasoning orthodoxy that has dominated test-time scaling since
Wang et al.'s self-consistency decoding (Wang et al., 2022), positioning
sequential refinement as the robust default for modern LLM reasoning and
necessitating a paradigm shift in how we approach inference-time optimization.