ChatPaper.aiChatPaper

大语言模型的决策是否忠实于其口头置信度?

Are LLM Decisions Faithful to Verbal Confidence?

January 12, 2026
作者: Jiawei Wang, Yanfei Zhou, Siddartha Devic, Deqing Fu
cs.AI

摘要

大型语言模型(LLMs)能够生成出人意料精准的自我不确定性评估。然而,这种表达出的置信度在多大程度上与模型的推理、知识或决策机制相关联仍不明确。为验证这一问题,我们提出RiskEval评估框架,旨在检验模型是否会根据不同的错误惩罚调整其弃答策略。通过对多个前沿模型的测试,我们发现了关键脱节现象:模型在表述语言置信度时不具备成本意识,在高惩罚条件下进行参与或弃答决策时也缺乏策略响应性。即使极端惩罚使得频繁弃答成为数学上的最优策略,模型几乎从不选择弃答,导致效用崩溃。这表明,仅靠校准化的语言置信度评分可能不足以构建可信赖且可解释的AI系统,因为现有模型缺乏将不确定性信号转化为最优风险敏感决策的策略能动性。
English
Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce RiskEval: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.
PDF43February 7, 2026