촉진, 억제, 반복: 언어 모델이 일대다 사실적 질문에 답하는 방식

초록

일대다 사실 질문(예: 한 나라의 도시 목록)에 답하기 위해, 언어 모델(LM)은 지식을 동시에 회상하고 이전 답변을 반복하지 않아야 합니다. 이 두 하위 작업이 내부적으로 어떻게 구현되고 통합되는지, 우리는 여러 데이터셋과 모델을 통해 '촉진 후 억제' 메커니즘을 확인했습니다: 모델은 먼저 모든 답변을 회상한 다음, 이전에 생성된 답변을 억제합니다. 구체적으로, LM은 주제와 이전 답변 토큰을 모두 사용하여 지식 회상을 수행하며, 주제 정보는 어텐션을 통해 전파되고 MLP는 답변을 촉진합니다. 그런 다음, 어텐션은 이전 답변 토큰에 주목하고 이를 억제하는 반면, MLP는 억제 신호를 증폭합니다. 우리의 메커니즘은 광범위한 실험적 증거로 입증되었습니다: 초기 디코딩과 인과적 추적을 사용하는 것 외에도, 우리는 특정 토큰에서 집계된 어텐션 업데이트를 디코딩하는 Token Lens와 특정 토큰에 대한 어텐션을 제거한 후 MLP 출력의 변화를 분석하는 녹아웃 방법을 도입하여 각 구성 요소가 서로 다른 토큰을 어떻게 사용하는지 분석했습니다. 전반적으로, 우리는 LM의 내부 구성 요소가 복잡한 사실 회상을 지원하기 위해 서로 다른 입력 토큰과 어떻게 상호작용하는지에 대한 새로운 통찰을 제공합니다. 코드는 https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries에서 확인할 수 있습니다.

English

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous answer tokens, while MLPs amplify the suppression signal. Our mechanism is corroborated by extensive experimental evidence: in addition to using early decoding and causal tracing, we analyze how components use different tokens by introducing both Token Lens, which decodes aggregated attention updates from specified tokens, and a knockout method that analyzes changes in MLP outputs after removing attention to specified tokens. Overall, we provide new insights into how LMs' internal components interact with different input tokens to support complex factual recall. Code is available at https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.

촉진, 억제, 반복: 언어 모델이 일대다 사실적 질문에 답하는 방식

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

초록

Support