ChatPaper.aiChatPaper

基于置信度的推理:通过不确定性头部高效验证大语言模型推理步骤

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

November 9, 2025
作者: Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, Mrinmaya Sachan
cs.AI

摘要

解決複雜任務通常需要大型語言模型生成冗長的多步驟推理鏈。先前研究表明,驗證單個推理步驟的正確性能夠進一步提升模型在此類任務上的表現與效率,並增強解決方案的可解釋性。然而現有驗證方法(如過程獎勵模型)存在計算成本高昂、適用領域受限或需要大規模人工/模型生成標註等侷限性。為此,我們提出一種基於數據驅動不確定性分數的輕量級步驟級推理驗證方案。通過訓練基於Transformer的不確定性量化頭模塊,利用凍結大型語言模型的內部狀態來實時估測其生成過程中推理步驟的不確定性。該方法實現全自動化:目標標籤可由更大規模語言模型(如DeepSeek R1)生成,或通過原模型的自監督方式產生。不確定性量化頭模塊不僅效能顯著,且具備輕量化特性(參數量少於1000萬)。在數學、規劃、常識問答等多領域測試中,其表現媲美甚至超越參數量達810倍以上的過程獎勵模型。我們的研究發現表明,大型語言模型的內部狀態編碼了其不確定性信息,可作為推理驗證的可靠信號,為構建可擴展、泛化性強的內省式大型語言模型開闢了新路徑。
English
Solving complex tasks usually requires LLMs to generate long multi-step reasoning chains. Previous work has shown that verifying the correctness of individual reasoning steps can further improve the performance and efficiency of LLMs on such tasks and enhance solution interpretability. However, existing verification approaches, such as Process Reward Models (PRMs), are either computationally expensive, limited to specific domains, or require large-scale human or model-generated annotations. Thus, we propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores. We train transformer-based uncertainty quantification heads (UHeads) that use the internal states of a frozen LLM to estimate the uncertainty of its reasoning steps during generation. The approach is fully automatic: target labels are generated either by another larger LLM (e.g., DeepSeek R1) or in a self-supervised manner by the original model itself. UHeads are both effective and lightweight, containing less than 10M parameters. Across multiple domains, including mathematics, planning, and general knowledge question answering, they match or even surpass the performance of PRMs that are up to 810x larger. Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification, offering a promising direction toward scalable and generalizable introspective LLMs.
PDF172December 2, 2025