回顾镜头：仅利用注意力图检测和减轻大型语言模型中的上下文幻觉

摘要

当被要求总结文章或回答问题时，大型语言模型（LLMs）可能会产生幻觉细节，并给出与输入上下文不符的未经证实的答案。本文描述了一种简单的方法来检测这种上下文幻觉。我们假设上下文幻觉与LLM关注所提供上下文信息与其自身生成之间的程度有关。基于这一直觉，我们提出了一个简单的幻觉检测模型，其输入特征由LLM在上下文和新生成标记（每个注意力头）上的注意权重比率给出。我们发现，基于这些回溯比率特征的线性分类器与利用整个LLM的隐藏状态或基于文本的蕴涵模型的更丰富的检测器一样有效。基于回溯比率的检测器——回溯镜头（Lookback Lens）被发现可以在任务和模型之间转移，使得一个在7B模型上训练的检测器可以应用（无需重新训练）到一个更大的13B模型上。我们进一步将该检测器应用于减少上下文幻觉，并发现一个简单的分类器引导解码方法能够减少幻觉的数量，例如在XSum摘要任务中减少了9.6%。

English

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

回顾镜头：仅利用注意力图检测和减轻大型语言模型中的上下文幻觉

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

摘要

Support