语义熵探针：在LLMs中实现稳健且廉价的幻觉检测

摘要

我们提出了语义熵探针(SEPs)，这是一种廉价且可靠的方法，用于对大型语言模型(LLMs)中的不确定性进行量化。幻觉是指听起来合理但事实不正确且任意的模型生成物，它们对LLMs的实际应用构成了重大挑战。Farquhar等人(2024)最近的研究提出了语义熵(SE)，它可以通过估计一组模型生成物中语义含义空间的不确定性来检测幻觉。然而，与SE计算相关的计算成本增加了5到10倍，这阻碍了实际应用。为了解决这个问题，我们提出了SEPs，它们直接从单个生成物的隐藏状态近似SE。SEPs易于训练，不需要在测试时对多个模型生成物进行采样，将语义不确定性量化的额外开销几乎降低到零。我们展示了SEPs在幻觉检测方面保持了高性能，并且比直接预测模型准确性的先前探测方法更好地推广到了分布之外的数据。我们在各种模型和任务上的结果表明，模型的隐藏状态捕获了SE，并且我们的消融研究进一步揭示了这种情况适用的令牌位置和模型层。

English

We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

语义熵探针：在LLMs中实现稳健且廉价的幻觉检测

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

摘要

Support