橋接內部概率與自洽性的大語言模型推理理論研究
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
October 17, 2025
作者: Zhi Zhou, Yuhao Tan, Zenan Li, Yuan Yao, Lan-Zhe Guo, Yu-Feng Li, Xiaoxing Ma
cs.AI
摘要
測試時擴展旨在通過增加計算資源來提升大型語言模型(LLMs)的推理性能。該領域內一種普遍採用的方法是基於採樣的測試時擴展方法,這些方法通過在推理過程中為給定輸入生成多條推理路徑來增強推理能力。然而,儘管其在實踐中取得了成功,其理論基礎仍未被充分探討。本文首次提供了一個基於置信度估計視角的理論框架,用於分析基於採樣的測試時擴展方法。基於此框架,我們分析了兩種主導範式:自我一致性和困惑度,並揭示了關鍵限制:自我一致性存在高估計誤差,而困惑度則表現出顯著的建模誤差以及估計誤差收斂可能惡化的問題。為解決這些限制,我們引入了RPC,這是一種混合方法,通過兩個關鍵組件——困惑度一致性和推理剪枝——來利用我們的理論洞察。困惑度一致性結合了自我一致性和困惑度的優勢,將估計誤差的收斂速度從線性提升至指數級,同時保持模型誤差不變。推理剪枝則通過剔除低概率推理路徑來防止性能退化。在七個基準數據集上的理論分析和實證結果均表明,RPC在減少推理誤差方面具有巨大潛力。值得注意的是,RPC在實現與自我一致性相當的推理性能的同時,不僅提升了置信度的可靠性,還將採樣成本降低了50%。代碼及相關資源可訪問https://wnjxyk.github.io/RPC獲取。
English
Test-time scaling seeks to improve the reasoning performance of large
language models (LLMs) by adding computational resources. A prevalent approach
within the field is sampling-based test-time scaling methods, which enhance
reasoning by generating multiple reasoning paths for a given input during
inference. However, despite its practical success, the theoretical foundations
remain underexplored. In this paper, we provide the first theoretical framework
for analyzing sampling-based test-time scaling methods, grounded in the
perspective of confidence estimation. Based on the framework, we analyze two
dominant paradigms: self-consistency and perplexity, and reveal key
limitations: self-consistency suffers from high estimation error while
perplexity exhibits substantial modeling error and possible degradation of the
estimation error convergence. To address these limitations, we introduce RPC, a
hybrid method that leverages our theoretical insights through two key
components: Perplexity Consistency and Reasoning Pruning. Perplexity
Consistency combines the strengths of self-consistency and perplexity, boosting
the convergence rate of estimation error from linear to exponential while
preserving model error. Reasoning Pruning prevents degradation by eliminating
low-probability reasoning paths. Both theoretical analysis and empirical
results across seven benchmark datasets demonstrate that RPC has a strong
potential for reducing reasoning error. Notably, RPC achieves reasoning
performance comparable to self-consistency while not only enhancing confidence
reliability but also reducing sampling costs by 50%. The code and resources are
available at https://wnjxyk.github.io/RPC.