信心二分法:工具使用代理的校準誤差分析與修正策略
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
January 12, 2026
作者: Weihao Xuan, Qingcheng Zeng, Heli Qi, Yunze Xiao, Junjue Wang, Naoto Yokoya
cs.AI
摘要
基於大型語言模型的自動代理系統正快速發展以處理多輪任務,但確保其可信度仍是關鍵挑戰。可信度的核心支柱之一為校準能力,即代理表達置信度的能力需能可靠反映其實際表現。雖然靜態模型的校準機制已相當成熟,但在工具整合的代理工作流程中的動態校準機制仍待探索。本研究系統性探討工具使用代理的語言化校準現象,揭示由工具類型驅動的根本性置信二分法。具體而言,我們的初步研究發現:證據型工具(如網路搜尋)會因檢索資訊的固有雜訊而系統性引發嚴重過度自信,而驗證型工具(如程式碼直譯器)則能透過確定性回饋錨定推理過程,從而緩解校準失準問題。為跨工具類型實現穩健的校準改進,我們提出強化學習微調框架,透過獎勵設計的整體基準方案,聯合優化任務準確率與校準度。實驗證明,經訓練的代理不僅能達成更優異的校準表現,更能從本地訓練環境穩健泛化至含雜訊的網路場景,並延伸至數學推理等不同領域。我們的成果凸顯針對工具使用代理制定領域特定校準策略的必要性。更廣泛而言,本研究為構建能於高風險現實部署中可靠傳達不確定性的自覺代理奠定了理論基礎。
English
Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar of this trustworthiness is calibration, which refers to an agent's ability to express confidence that reliably reflects its actual performance. While calibration is well-established for static models, its dynamics in tool-integrated agentic workflows remain underexplored. In this work, we systematically investigate verbalized calibration in tool-use agents, revealing a fundamental confidence dichotomy driven by tool type. Specifically, our pilot study identifies that evidence tools (e.g., web search) systematically induce severe overconfidence due to inherent noise in retrieved information, while verification tools (e.g., code interpreters) can ground reasoning through deterministic feedback and mitigate miscalibration. To robustly improve calibration across tool types, we propose a reinforcement learning (RL) fine-tuning framework that jointly optimizes task accuracy and calibration, supported by a holistic benchmark of reward designs. We demonstrate that our trained agents not only achieve superior calibration but also exhibit robust generalization from local training environments to noisy web settings and to distinct domains such as mathematical reasoning. Our results highlight the necessity of domain-specific calibration strategies for tool-use agents. More broadly, this work establishes a foundation for building self-aware agents that can reliably communicate uncertainty in high-stakes, real-world deployments.