ChatPaper.aiChatPaper

代理不确定性量化

Agentic Uncertainty Quantification

January 22, 2026
作者: Jiaxin Zhang, Prafulla Kumar Choubey, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu
cs.AI

摘要

尽管人工智能代理在长程推理中展现出卓越能力,但其可靠性深受"幻觉螺旋"效应制约——早期认知误差会不可逆地持续扩散。现有方法面临两难困境:不确定性量化(UQ)方法通常作为被动传感器,仅能诊断风险而无法处置;自反思机制则易陷入持续或盲目的修正循环。为弥合这一鸿沟,我们提出统一的双过程代理化UQ(AUQ)框架,将言语化不确定性转化为主动式双向控制信号。该架构包含两个互补机制:系统1(不确定性感知记忆UAM)通过隐式传播言语化置信度与语义解释来避免盲目决策;系统2(不确定性感知反思UAR)则将这些解释作为理性线索,仅在必要时触发靶向式推理时解析。这种设计使代理能动态平衡高效执行与深度审议。在闭环基准测试与开放式深度研究任务上的大量实验表明,我们的无训练方法实现了卓越的性能与轨迹级校准。我们相信这一原则性框架AUQ是构建可靠智能代理的重要突破。
English
Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.
PDF11January 24, 2026