最快检测幻觉起始:延迟界与学习型CUSUM统计量
Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics
June 10, 2026
作者: Igor Itkin
cs.AI
摘要
令牌级幻觉检测器通过所有令牌的AUC作为分类器进行评估,而流式监控器则依据反应时间判断性能:即从幻觉出现到警报触发的间隔令牌数。我们将幻觉起始检测构建为最快变化检测问题。基于RAGTruth验证的潜在忠实/幻觉状态一阶马尔可夫模型,将该任务纳入经典变点理论框架,并导出Lorden检测延迟下界:在虚警率为0.01时,延迟约为1.3个令牌。进一步研究表明,因果循环标注器相当于具有学习增量的CUSUM检测器:在匹配的虚警率下,其检测延迟为11-13个令牌,而线性逐令牌基线为31个;通过受控分解,该优势主要源于更优的逐令牌评分,而非时间累积效应。Donsker-Varadhan型信息率最优性定理解释了剩余量级差距:学习评分仅实现特征携带散度的1/4.5,该缺陷无法通过重新校准消除,其余部分源于有限时间效应。分类指标掩盖了这种延迟结构,而序列分析使其变得可量化。
English
Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable