ChatPaper.aiChatPaper

代理信心校准

Agentic Confidence Calibration

January 22, 2026
作者: Jiaxin Zhang, Caiming Xiong, Chien-Sheng Wu
cs.AI

摘要

人工智能体正从被动语言模型快速演进为能执行复杂多步任务的自主系统。然而,其在失败情况下的过度自信仍是高风险场景部署的根本障碍。现有针对静态单轮输出的校准方法无法解决智能体系统的独特挑战,例如任务轨迹中的误差累积、外部工具的不确定性以及不透明的故障模式。为应对这些挑战,我们首次提出"智能体置信度校准"问题,并创新性地提出全轨迹校准框架——一种通过提取智能体完整轨迹中从宏观动态到微观稳定性的丰富过程特征的新型诊断框架。该框架采用简洁可解释的模型,在八项基准测试中,跨多种大语言模型和不同智能体框架,持续在校准度与判别力上超越强基线方法。除性能优势外,HTC还带来三大突破:通过揭示故障背后的信号提供可解释性,无需重新训练即可跨领域应用的迁移能力,以及通过通用智能体校准器实现的泛化能力——该校准器在跨域GAIA基准测试中取得了最佳校准效果(最低ECE)。这些成果共同确立了以过程为核心的置信度校准新范式,为诊断和提升人工智能体可靠性提供了系统框架。
English
AI agents are rapidly advancing from passive language models to autonomous systems executing complex, multi-step tasks. Yet their overconfidence in failure remains a fundamental barrier to deployment in high-stakes settings. Existing calibration methods, built for static single-turn outputs, cannot address the unique challenges of agentic systems, such as compounding errors along trajectories, uncertainty from external tools, and opaque failure modes. To address these challenges, we introduce, for the first time, the problem of Agentic Confidence Calibration and propose Holistic Trajectory Calibration (HTC), a novel diagnostic framework that extracts rich process-level features ranging from macro dynamics to micro stability across an agent's entire trajectory. Powered by a simple, interpretable model, HTC consistently surpasses strong baselines in both calibration and discrimination, across eight benchmarks, multiple LLMs, and diverse agent frameworks. Beyond performance, HTC delivers three essential advances: it provides interpretability by revealing the signals behind failure, enables transferability by applying across domains without retraining, and achieves generalization through a General Agent Calibrator (GAC) that achieves the best calibration (lowest ECE) on the out-of-domain GAIA benchmark. Together, these contributions establish a new process-centric paradigm for confidence calibration, providing a framework for diagnosing and enhancing the reliability of AI agents.
PDF11January 24, 2026