注意が満たす：言語モデルの事実誤りに対する制約充足の視点

要旨

Transformerベースの大規模言語モデル（LLM）が事実に反するテキストを生成する際の内部動作を調査します。我々は、事実に関するクエリを制約充足問題としてモデル化し、このフレームワークを用いてモデルが内部的に事実制約とどのように相互作用するかを探ります。具体的には、モデルの制約トークンへの注意とその応答の事実的精度との間に強い正の相関関係があることを発見しました。40,000以上のプロンプトを含む11のデータセットを整備し、Llama-2ファミリーの全スケール（7B、13B、70B）における事実誤りの予測タスクを研究しました。我々は、自己注意パターンをプローブするSAT Probeという手法を提案し、これにより制約充足と事実誤りを予測し、早期の誤り識別を可能にします。このアプローチと発見は、LLMにおける事実性のメカニズム的理解を活用することで信頼性を向上させることができることを示しています。

English

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability.

注意が満たす：言語モデルの事実誤りに対する制約充足の視点

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

要旨

Support