주의력이 만족시킨다: 언어 모델의 사실 오류에 대한 제약 조건 충족 관점

초록

우리는 Transformer 기반 대규모 언어 모델(LLMs)이 사실적으로 부정확한 텍스트를 생성할 때의 내부 동작을 조사합니다. 우리는 사실적 질의를 제약 충족 문제(Constraint Satisfaction Problems)로 모델링하고, 이 프레임워크를 사용하여 모델이 내부적으로 사실적 제약과 어떻게 상호작용하는지 탐구합니다. 특히, 모델이 제약 토큰에 주의를 기울이는 정도와 응답의 사실적 정확도 사이에 강한 양의 상관관계가 있음을 발견했습니다. 40,000개 이상의 프롬프트로 구성된 11개의 데이터셋을 통해, 우리는 모든 규모(7B, 13B, 70B)의 Llama-2 모델군을 대상으로 사실적 오류를 예측하는 과제를 연구합니다. 우리는 SAT Probe라는 방법을 제안하는데, 이는 자기 주의(self-attention) 패턴을 탐색하여 제약 충족 및 사실적 오류를 예측하고, 조기 오류 식별을 가능하게 합니다. 이 접근법과 연구 결과는 LLMs의 사실성에 대한 기계적 이해를 활용하여 신뢰성을 향상시킬 수 있는 방법을 보여줍니다.

English

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability.

주의력이 만족시킨다: 언어 모델의 사실 오류에 대한 제약 조건 충족 관점

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

초록

Support