关注满足：约束满足视角下的语言模型事实错误

摘要

我们研究了基于Transformer的大型语言模型（LLMs）在生成事实错误文本时的内部行为。我们提出将事实查询建模为约束满足问题，并利用这一框架来研究模型如何在内部与事实约束进行交互。具体而言，我们发现模型对约束标记的关注程度与其响应的事实准确性之间存在强烈的正相关关系。在我们策划的包含超过40,000个提示的11个数据集中，我们研究了使用Llama-2系列在所有规模（7B、13B、70B）上预测事实错误的任务。我们提出了SAT Probe，一种探测自注意力模式的方法，可以预测约束满足和事实错误，并允许早期错误识别。这一方法和发现展示了如何利用对LLMs中事实性的机械理解可以增强可靠性。

English

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability.

关注满足：约束满足视角下的语言模型事实错误

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

摘要

Support