回顧鏡頭：僅使用注意力地圖在大型語言模型中檢測和緩解情境幻覺

摘要

當要求總結文章或根據一段文字回答問題時，大型語言模型（LLMs）可能會產生幻覺細節，並回答與輸入上下文不符的未經證實的答案。本文描述了一種檢測此類上下文幻覺的簡單方法。我們假設上下文幻覺與LLM對所提供上下文信息與其自身生成之間的關聯程度有關。基於這種直覺，我們提出了一種簡單的幻覺檢測模型，其輸入特徵由注意力權重在上下文與新生成標記（對於每個注意力頭）之間的比率組成。我們發現，基於這些回顧比特徵的線性分類器與利用LLM的整個隱藏狀態或基於文本的蘊涵模型的更豐富檢測器一樣有效。基於回顧比的檢測器——回顧鏡頭（Lookback Lens）被發現可以跨任務甚至模型進行轉移，使得一個在7B模型上訓練的檢測器可以應用（無需重新訓練）到一個更大的13B模型上。我們進一步應用此檢測器來減輕上下文幻覺，發現一種簡單的分類器引導解碼方法能夠減少幻覺的程度，例如在XSum總結任務中減少了9.6%。

English

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

回顧鏡頭：僅使用注意力地圖在大型語言模型中檢測和緩解情境幻覺

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

摘要

Support