最快检测幻觉起始：延迟界与学习型CUSUM统计量

摘要

令牌级幻觉检测器通过所有令牌的AUC作为分类器进行评估，而流式监控器则依据反应时间判断性能：即从幻觉出现到警报触发的间隔令牌数。我们将幻觉起始检测构建为最快变化检测问题。基于RAGTruth验证的潜在忠实/幻觉状态一阶马尔可夫模型，将该任务纳入经典变点理论框架，并导出Lorden检测延迟下界：在虚警率为0.01时，延迟约为1.3个令牌。进一步研究表明，因果循环标注器相当于具有学习增量的CUSUM检测器：在匹配的虚警率下，其检测延迟为11-13个令牌，而线性逐令牌基线为31个；通过受控分解，该优势主要源于更优的逐令牌评分，而非时间累积效应。Donsker-Varadhan型信息率最优性定理解释了剩余量级差距：学习评分仅实现特征携带散度的1/4.5，该缺陷无法通过重新校准消除，其余部分源于有限时间效应。分类指标掩盖了这种延迟结构，而序列分析使其变得可量化。

English

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable