大语言模型推理中内部概率与自洽性桥接的理论研究
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
October 17, 2025
作者: Zhi Zhou, Yuhao Tan, Zenan Li, Yuan Yao, Lan-Zhe Guo, Yu-Feng Li, Xiaoxing Ma
cs.AI
摘要
测试时扩展旨在通过增加计算资源来提升大型语言模型(LLMs)的推理性能。该领域内一种普遍采用的方法是基于采样的测试时扩展技术,其在推理过程中为给定输入生成多条推理路径,从而增强推理能力。然而,尽管该方法在实践中取得了成功,其理论基础仍待深入探索。本文首次从置信度估计的视角出发,构建了一个理论框架,用于分析基于采样的测试时扩展方法。基于此框架,我们剖析了两种主流范式:自一致性与困惑度,并揭示了它们的关键局限:自一致性存在较高的估计误差,而困惑度则表现出显著的模型误差及估计误差收敛可能退化的问题。为应对这些局限,我们提出了RPC,一种融合了理论洞见的混合方法,其核心包含两个组件:困惑度一致性与推理剪枝。困惑度一致性结合了自一致性与困惑度的优势,在保持模型误差的同时,将估计误差的收敛速度从线性提升至指数级。推理剪枝则通过剔除低概率推理路径,防止性能退化。理论分析与跨七个基准数据集的实证结果均表明,RPC在降低推理错误方面展现出强大潜力。尤为突出的是,RPC在实现与自一致性相当的推理性能的同时,不仅增强了置信度的可靠性,还将采样成本降低了50%。代码与资源已发布于https://wnjxyk.github.io/RPC。
English
Test-time scaling seeks to improve the reasoning performance of large
language models (LLMs) by adding computational resources. A prevalent approach
within the field is sampling-based test-time scaling methods, which enhance
reasoning by generating multiple reasoning paths for a given input during
inference. However, despite its practical success, the theoretical foundations
remain underexplored. In this paper, we provide the first theoretical framework
for analyzing sampling-based test-time scaling methods, grounded in the
perspective of confidence estimation. Based on the framework, we analyze two
dominant paradigms: self-consistency and perplexity, and reveal key
limitations: self-consistency suffers from high estimation error while
perplexity exhibits substantial modeling error and possible degradation of the
estimation error convergence. To address these limitations, we introduce RPC, a
hybrid method that leverages our theoretical insights through two key
components: Perplexity Consistency and Reasoning Pruning. Perplexity
Consistency combines the strengths of self-consistency and perplexity, boosting
the convergence rate of estimation error from linear to exponential while
preserving model error. Reasoning Pruning prevents degradation by eliminating
low-probability reasoning paths. Both theoretical analysis and empirical
results across seven benchmark datasets demonstrate that RPC has a strong
potential for reducing reasoning error. Notably, RPC achieves reasoning
performance comparable to self-consistency while not only enhancing confidence
reliability but also reducing sampling costs by 50%. The code and resources are
available at https://wnjxyk.github.io/RPC.