語義熵探針：在LLM中實現堅固且經濟的幻覺檢測

摘要

我們提出了語義熵探針（SEPs），這是一種廉價且可靠的方法，用於大型語言模型（LLMs）中的不確定性量化。幻覺是指聽起來合理但事實不正確且任意的模型生成，對LLMs的實際應用提出了重大挑戰。Farquhar等人（2024年）最近的研究提出了語義熵（SE），它可以通過估計一組模型生成物中語義含義空間的不確定性來檢測幻覺。然而，與SE計算相關的計算成本增加了5到10倍，這阻礙了實際應用。為了解決這個問題，我們提出了SEPs，它們可以直接從單個生成物的隱藏狀態中近似SE。SEPs易於訓練，並且在測試時不需要對多個模型生成物進行抽樣，將語義不確定性量化的開銷幾乎降低到零。我們展示了SEPs在幻覺檢測方面保持高性能，並且比直接預測模型準確性的先前探測方法更好地推廣到分布之外的數據。我們跨模型和任務的結果表明，模型的隱藏狀態捕獲了SE，我們的消融研究進一步深入瞭解了這種情況的標記位置和模型層。

English

We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

語義熵探針：在LLM中實現堅固且經濟的幻覺檢測

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

摘要

Support