代理信心校准

摘要

人工智能代理正迅速从被动语言模型发展为能执行复杂多步骤任务的自主系统。然而，其在失败场景下的过度自信仍是高风险领域部署的根本障碍。现有针对静态单轮输出的校准方法无法应对智能体系统的独特挑战，例如轨迹中的误差累积、外部工具的不确定性以及不透明的故障模式。为解决这些问题，我们首次提出智能体置信度校准这一新课题，并创新性地提出全轨迹校准框架——该诊断框架能从智能体完整轨迹中提取从宏观动态到微观稳定性的丰富过程级特征。基于简洁可解释的模型，HTC在八大基准测试、多种大语言模型及不同智能体框架中，于校准度和区分度指标上均持续超越强基线方法。除性能优势外，HTC还带来三项重要突破：通过揭示失败背后的信号提供可解释性，实现跨领域无需重新训练的可迁移性，并借助通用智能体校准器达成泛化能力——该校准器在跨域GAIA基准测试中取得了最佳校准效果（最低ECE）。这些成果共同确立了以过程为核心的置信度校准新范式，为诊断和提升AI代理的可靠性提供了系统性框架。

English

AI agents are rapidly advancing from passive language models to autonomous systems executing complex, multi-step tasks. Yet their overconfidence in failure remains a fundamental barrier to deployment in high-stakes settings. Existing calibration methods, built for static single-turn outputs, cannot address the unique challenges of agentic systems, such as compounding errors along trajectories, uncertainty from external tools, and opaque failure modes. To address these challenges, we introduce, for the first time, the problem of Agentic Confidence Calibration and propose Holistic Trajectory Calibration (HTC), a novel diagnostic framework that extracts rich process-level features ranging from macro dynamics to micro stability across an agent's entire trajectory. Powered by a simple, interpretable model, HTC consistently surpasses strong baselines in both calibration and discrimination, across eight benchmarks, multiple LLMs, and diverse agent frameworks. Beyond performance, HTC delivers three essential advances: it provides interpretability by revealing the signals behind failure, enables transferability by applying across domains without retraining, and achieves generalization through a General Agent Calibrator (GAC) that achieves the best calibration (lowest ECE) on the out-of-domain GAIA benchmark. Together, these contributions establish a new process-centric paradigm for confidence calibration, providing a framework for diagnosing and enhancing the reliability of AI agents.