置信度二分法:工具使用代理的误校准分析与缓解策略 (注:此处"Confidence Dichotomy"译为"置信度二分法"以体现算法决策中置信度判断的非此即彼特性;"Tool-Use Agents"采用"工具使用代理"这一学界通用译法;"Miscalibration"译为"误校准"准确传达校准偏差的技术概念)
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
January 12, 2026
作者: Weihao Xuan, Qingcheng Zeng, Heli Qi, Yunze Xiao, Junjue Wang, Naoto Yokoya
cs.AI
摘要
基于大语言模型的自主智能体正快速发展以处理多轮任务,但确保其可信度仍是关键挑战。可信度的核心支柱之一为校准能力,即智能体表达置信度的能力需与其实际表现可靠匹配。虽然静态模型的校准研究已较为成熟,但工具集成型智能工作流中的校准动态机制仍待深入探索。本研究系统考察了工具使用型智能体的言语化校准现象,揭示了由工具类型驱动的基础性置信度二分法则。具体而言,我们的先导研究发现:证据型工具(如网络搜索)会因检索信息的内在噪声引发系统性严重过度自信,而验证型工具(如代码解释器)可通过确定性反馈夯实推理过程从而缓解校准偏差。为全面提升跨工具类型的校准能力,我们提出了基于强化学习的微调框架,通过综合评估奖励设计方案,联合优化任务准确率与校准度。实验表明,经训练的智能体不仅实现了更优的校准表现,还展现出从本地训练环境到嘈杂网络场景、乃至数学推理等不同领域的强大泛化能力。我们的研究结果凸显了针对工具使用型智能体制定领域特异性校准策略的必要性。更广泛而言,这项工作为构建能在高风险现实部署中可靠传达不确定性的自感知智能体奠定了理论基础。
English
Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar of this trustworthiness is calibration, which refers to an agent's ability to express confidence that reliably reflects its actual performance. While calibration is well-established for static models, its dynamics in tool-integrated agentic workflows remain underexplored. In this work, we systematically investigate verbalized calibration in tool-use agents, revealing a fundamental confidence dichotomy driven by tool type. Specifically, our pilot study identifies that evidence tools (e.g., web search) systematically induce severe overconfidence due to inherent noise in retrieved information, while verification tools (e.g., code interpreters) can ground reasoning through deterministic feedback and mitigate miscalibration. To robustly improve calibration across tool types, we propose a reinforcement learning (RL) fine-tuning framework that jointly optimizes task accuracy and calibration, supported by a holistic benchmark of reward designs. We demonstrate that our trained agents not only achieve superior calibration but also exhibit robust generalization from local training environments to noisy web settings and to distinct domains such as mathematical reasoning. Our results highlight the necessity of domain-specific calibration strategies for tool-use agents. More broadly, this work establishes a foundation for building self-aware agents that can reliably communicate uncertainty in high-stakes, real-world deployments.