回顧鏡頭:僅使用注意力地圖在大型語言模型中檢測和緩解情境幻覺
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
July 9, 2024
作者: Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James Glass
cs.AI
摘要
當要求總結文章或根據一段文字回答問題時,大型語言模型(LLMs)可能會產生幻覺細節,並回答與輸入上下文不符的未經證實的答案。本文描述了一種檢測此類上下文幻覺的簡單方法。我們假設上下文幻覺與LLM對所提供上下文信息與其自身生成之間的關聯程度有關。基於這種直覺,我們提出了一種簡單的幻覺檢測模型,其輸入特徵由注意力權重在上下文與新生成標記(對於每個注意力頭)之間的比率組成。我們發現,基於這些回顧比特徵的線性分類器與利用LLM的整個隱藏狀態或基於文本的蘊涵模型的更豐富檢測器一樣有效。基於回顧比的檢測器——回顧鏡頭(Lookback Lens)被發現可以跨任務甚至模型進行轉移,使得一個在7B模型上訓練的檢測器可以應用(無需重新訓練)到一個更大的13B模型上。我們進一步應用此檢測器來減輕上下文幻覺,發現一種簡單的分類器引導解碼方法能夠減少幻覺的程度,例如在XSum總結任務中減少了9.6%。
English
When asked to summarize articles or answer questions given a passage, large
language models (LLMs) can hallucinate details and respond with unsubstantiated
answers that are inaccurate with respect to the input context. This paper
describes a simple approach for detecting such contextual hallucinations. We
hypothesize that contextual hallucinations are related to the extent to which
an LLM attends to information in the provided context versus its own
generations. Based on this intuition, we propose a simple hallucination
detection model whose input features are given by the ratio of attention
weights on the context versus newly generated tokens (for each attention head).
We find that a linear classifier based on these lookback ratio features is as
effective as a richer detector that utilizes the entire hidden states of an LLM
or a text-based entailment model. The lookback ratio-based detector -- Lookback
Lens -- is found to transfer across tasks and even models, allowing a detector
that is trained on a 7B model to be applied (without retraining) to a larger
13B model. We further apply this detector to mitigate contextual
hallucinations, and find that a simple classifier-guided decoding approach is
able to reduce the amount of hallucination, for example by 9.6% in the XSum
summarization task.Summary
AI-Generated Summary