CalVerT：通过校准验证器遥测增强智能体，提升知识密集型任务中的行动与学习

摘要

在知识密集型问答中，大语言模型代理在执行检索和推理动作时，对其当前答案是否不确定、缺乏支撑或已完整缺乏完整认知。这会导致两种失败模式：一是给出自信但缺乏支撑的答案，损害准确性；二是在已有证据足够时过度检索，造成计算浪费。为了让代理更全面地理解其所处的状态空间，我们引入了校准验证器遥测（CalVerT），该机制通过额外遥测信息（校准的自信心分数和基于真实性的验证器分数）来增强代理的状态。我们证明，CalVerT 在无训练和基于训练的场景中均能提升代理性能。在四个问答基准测试中，我们发现 CalVerT 通过在代理过度依赖参数化知识时触发检索，并在代理拥有足够上下文时减少冗余检索，从而提高了 F1 分数。我们表明，CalVerT 无需训练即可增强现有问答框架。此外，CalVerT 还能提升训练过的系统：仅通过向代理状态中添加遥测信息，我们在强化学习后观察到该代理相比未使用 CalVerT 遥测但经过相同训练的代理性能有所改进。

English

LLM agents in knowledge intensive question answering take retrieval and reasoning actions with incomplete knowledge about whether their current answer is uncertain, unsupported, or already complete. This produces two failure modes: committing to confident but unsupported answers, which hurts accuracy, and over-retrieving when the evidence in hand already suffices, resulting in wasted compute. To give agents a more complete picture of the state space they are operating in, we introduce calibrated verifier telemetry (CalVerT), which augments the agent's state with additional telemetry: a calibrated self-confidence score and a grounding verifier score. We show that CalVerT can improve agents in both training-free and training-based settings. On four QA benchmarks, we find that CalVerT raises F1 by triggering retrieval in cases where agents over-rely on parametric knowledge, while cutting redundant retrieval in cases where agents have sufficient context to answer. We show that CalVerT can augment existing QA frameworks without training. Moreover, CalVerT also improves trained systems: by simply augmenting an agent's state with telemetry, we observe improvements after reinforcement learning, as compared to an agent with identical training but no CalVerT telemetry.