促進、抑制、反復：言語モデルが一対多の事実クエリに答える方法

要旨

一対多の事実クエリ（例：ある国の都市を列挙する）に答えるためには、言語モデル（LM）が知識を同時に想起し、かつ以前の回答を繰り返さないようにする必要があります。これらの二つのサブタスクは、内部的にどのように実装され、統合されているのでしょうか？複数のデータセットとモデルにわたって、私たちは「促進-抑制メカニズム」を特定しました：モデルはまずすべての回答を想起し、その後、以前に生成された回答を抑制します。具体的には、LMsは主題と以前の回答トークンの両方を使用して知識の想起を行い、注意機構が主題情報を伝播し、MLPが回答を促進します。その後、注意機構は以前の回答トークンに注意を向けて抑制し、MLPが抑制信号を増幅します。私たちのメカニズムは、広範な実験的証拠によって裏付けられています：早期デコードと因果トレーシングを使用するだけでなく、指定されたトークンからの集約された注意更新をデコードするToken Lensと、指定されたトークンへの注意を除去した後のMLP出力の変化を分析するノックアウト法を導入することで、コンポーネントが異なるトークンをどのように使用するかを分析しました。全体として、LMsの内部コンポーネントが異なる入力トークンとどのように相互作用して複雑な事実の想起をサポートするかについての新しい洞察を提供します。コードはhttps://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queriesで公開されています。

English

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous answer tokens, while MLPs amplify the suppression signal. Our mechanism is corroborated by extensive experimental evidence: in addition to using early decoding and causal tracing, we analyze how components use different tokens by introducing both Token Lens, which decodes aggregated attention updates from specified tokens, and a knockout method that analyzes changes in MLP outputs after removing attention to specified tokens. Overall, we provide new insights into how LMs' internal components interact with different input tokens to support complex factual recall. Code is available at https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.

促進、抑制、反復：言語モデルが一対多の事実クエリに答える方法

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

要旨

Support