Lookback Lens: 大規模言語モデルにおける文脈的幻覚の検出と緩和 - アテンションマップのみを用いたアプローチ

要旨

記事の要約や与えられた文章に基づく質問への回答を求められた際、大規模言語モデル（LLM）は詳細を捏造し、入力コンテキストに対して不正確な根拠のない回答を返すことがあります。本論文では、このようなコンテキストに基づく捏造（文脈的幻覚）を検出するためのシンプルなアプローチを提案します。我々は、文脈的幻覚が、LLMが提供されたコンテキスト内の情報に対してどれだけ注意を払うか、あるいは自身の生成にどれだけ依存するかに関連していると仮定します。この直感に基づき、各アテンションヘッドにおけるコンテキストと新たに生成されたトークンに対するアテンションウェイトの比率を入力特徴量とする、シンプルな幻覚検出モデルを提案します。この「Lookback Ratio」特徴量に基づく線形分類器は、LLMの隠れ状態全体やテキストベースの含意モデルを利用するより複雑な検出器と同等の効果を持つことがわかりました。このLookback Ratioベースの検出器「Lookback Lens」は、タスクやモデルをまたいで転移可能であり、7Bモデルで訓練された検出器を再訓練なしでより大規模な13Bモデルに適用できることが確認されました。さらに、この検出器を文脈的幻覚の軽減に適用し、シンプルな分類器ガイド付きデコードアプローチが幻覚の量を削減できることを示します。例えば、XSum要約タスクにおいて9.6%の幻覚削減が確認されました。

English

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

Lookback Lens: 大規模言語モデルにおける文脈的幻覚の検出と緩和 - アテンションマップのみを用いたアプローチ

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

要旨

Support