룩백 렌즈: 어텐션 맵만을 사용하여 대규모 언어 모델의 문맥적 환각 현상 탐지 및 완화

초록

기사 요약이나 주어진 문단에 대한 질문에 답변을 요청받을 때, 대형 언어 모델(LLM)은 사실과 다른 세부 정보를 지어내거나 입력된 맥락과 부합하지 않는 근거 없는 답변을 생성할 수 있습니다. 본 논문은 이러한 맥락적 환각(contextual hallucination)을 탐지하기 위한 간단한 접근 방식을 설명합니다. 우리는 맥락적 환각이 LLM이 제공된 맥락 정보에 주의를 기울이는 정도와 자체적으로 생성한 정보에 주의를 기울이는 정도와 관련이 있다고 가정합니다. 이러한 직관을 바탕으로, 각 어텐션 헤드(attention head)에서 맥락 토큰과 새로 생성된 토큰에 대한 어텐션 가중치의 비율을 입력 특징으로 사용하는 간단한 환각 탐지 모델을 제안합니다. 우리는 이러한 '되돌아보기 비율(lookback ratio)' 특징을 기반으로 한 선형 분류기가 LLM의 전체 은닉 상태(hidden states)나 텍스트 기반 함의 모델(entailment model)을 활용하는 더 복잡한 탐지기만큼 효과적임을 발견했습니다. '되돌아보기 비율' 기반 탐지기인 Lookback Lens는 작업 간, 심지어 모델 간에도 전이 가능하여, 7B 모델에서 훈련된 탐지기를 재훈련 없이 더 큰 13B 모델에 적용할 수 있습니다. 또한, 이 탐지기를 맥락적 환각을 완화하는 데 적용한 결과, 간단한 분류기 기반 디코딩 접근법이 환각을 줄이는 데 효과적임을 확인했습니다. 예를 들어, XSum 요약 작업에서 환각을 9.6% 감소시킬 수 있었습니다.

English

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

룩백 렌즈: 어텐션 맵만을 사용하여 대규모 언어 모델의 문맥적 환각 현상 탐지 및 완화

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

초록

Support