ChatPaper.aiChatPaper

幻覺起始的最快偵測:延遲界與學習型CUSUM統計量

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

June 10, 2026
作者: Igor Itkin
cs.AI

摘要

基於詞元的幻覺偵測器以分類器的方式進行評估,透過所有詞元的AUC來衡量;然而,串流監控器則以其反應時間來評判:即從幻覺發生到警報觸發之間所經過的詞元數量。我們將幻覺起始偵測表述為一個最快變化檢測問題。一個關於潛在忠實/幻覺狀態的一階馬可夫模型(經RAGTruth驗證)將此任務置於經典變點理論的框架內,並導出Lorden關於檢測延遲的下界:在誤警率為0.01時,約為1.3個詞元。接著我們證明,一個因果遞迴標記器實際上就是一個具有學習增量形式的CUSUM;在相同的誤警率下,它能在11至13個詞元內完成檢測,而線性逐詞元基線則需31個詞元。透過控制分解,我們將此優勢主要歸因於更好的逐詞元得分,而非時間累積效應。一個Donsker-Varadhan類型的資訊率最優定理解釋了剩餘的數量級差距:學習到的得分僅實現了特徵所承載散度的四點五分之一,而重新校準無法消除此缺陷,其餘部分則來自有限時域效應。分類指標掩蓋了這種延遲結構;而序列分析使其得以被量化。
English
Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable