幻覺起始的最快偵測：延遲界與學習型CUSUM統計量

摘要

基於詞元的幻覺偵測器以分類器的方式進行評估，透過所有詞元的AUC來衡量；然而，串流監控器則以其反應時間來評判：即從幻覺發生到警報觸發之間所經過的詞元數量。我們將幻覺起始偵測表述為一個最快變化檢測問題。一個關於潛在忠實/幻覺狀態的一階馬可夫模型（經RAGTruth驗證）將此任務置於經典變點理論的框架內，並導出Lorden關於檢測延遲的下界：在誤警率為0.01時，約為1.3個詞元。接著我們證明，一個因果遞迴標記器實際上就是一個具有學習增量形式的CUSUM；在相同的誤警率下，它能在11至13個詞元內完成檢測，而線性逐詞元基線則需31個詞元。透過控制分解，我們將此優勢主要歸因於更好的逐詞元得分，而非時間累積效應。一個Donsker-Varadhan類型的資訊率最優定理解釋了剩餘的數量級差距：學習到的得分僅實現了特徵所承載散度的四點五分之一，而重新校準無法消除此缺陷，其餘部分則來自有限時域效應。分類指標掩蓋了這種延遲結構；而序列分析使其得以被量化。

English

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable