의미론적 엔트로피 프로브: LLM에서의 강력하고 저비용 환각 탐지

초록

우리는 대규모 언어 모델(LLMs)에서 불확실성 정량화를 위한 저렴하고 신뢰할 수 있는 방법으로 의미론적 엔트로피 프로브(Semantic Entropy Probes, SEPs)를 제안한다. 사실적으로 들리지만 사실과 다르고 임의적인 모델 생성물인 환각(hallucination)은 LLMs의 실용적 채택에 있어 주요한 도전 과제이다. Farquhar 등(2024)의 최근 연구는 의미론적 엔트로피(Semantic Entropy, SE)를 제안하여, 모델 생성물 집합에 대한 의미론적 공간에서의 불확실성을 추정함으로써 환각을 탐지할 수 있다. 그러나 SE 계산에 따른 5~10배의 계산 비용 증가는 실용적 채택을 방해한다. 이를 해결하기 위해, 우리는 단일 생성물의 은닉 상태(hidden states)에서 직접 SE를 근사하는 SEPs를 제안한다. SEPs는 학습이 간단하며 테스트 시점에 여러 모델 생성물을 샘플링할 필요가 없어, 의미론적 불확실성 정량화의 오버헤드를 거의 제로로 줄인다. 우리는 SEPs가 환각 탐지에서 높은 성능을 유지하며, 모델 정확도를 직접 예측하는 기존 프로빙 방법보다 분포 외 데이터(out-of-distribution data)에 대해 더 잘 일반화됨을 보여준다. 다양한 모델과 작업에 걸친 우리의 결과는 모델 은닉 상태가 SE를 포착함을 시사하며, 우리의 어블레이션 연구는 이를 뒷받침하는 토큰 위치와 모델 계층에 대한 추가적인 통찰을 제공한다.

English

We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

의미론적 엔트로피 프로브: LLM에서의 강력하고 저비용 환각 탐지

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

초록

Support