促進、抑制、迭代:語言模型如何回應一對多的事實查詢
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
February 27, 2025
作者: Tianyi Lorena Yan, Robin Jia
cs.AI
摘要
為了解答一對多的事實查詢(例如列出某個國家的城市),語言模型(LM)必須同時回憶知識並避免重複先前的答案。這兩個子任務是如何在內部實現並整合的呢?通過多個數據集和模型,我們發現了一種“先促進後抑制”的機制:模型首先回憶所有答案,然後抑制先前生成的答案。具體而言,語言模型利用主體和先前的答案標記來執行知識回憶,其中注意力傳播主體信息,而多層感知器(MLP)則促進答案的生成。接著,注意力關注並抑制先前的答案標記,而MLP則放大抑制信號。我們的機制得到了大量實驗證據的支持:除了使用早期解碼和因果追蹤外,我們還通過引入Token Lens(解碼來自指定標記的聚合注意力更新)和一種敲除方法(分析在移除對指定標記的注意力後MLP輸出的變化)來分析組件如何使用不同的標記。總體而言,我們提供了新的見解,揭示了語言模型的內部組件如何與不同的輸入標記互動,以支持複雜的事實回憶。代碼可在https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries獲取。
English
To answer one-to-many factual queries (e.g., listing cities of a country), a
language model (LM) must simultaneously recall knowledge and avoid repeating
previous answers. How are these two subtasks implemented and integrated
internally? Across multiple datasets and models, we identify a
promote-then-suppress mechanism: the model first recalls all answers, and then
suppresses previously generated ones. Specifically, LMs use both the subject
and previous answer tokens to perform knowledge recall, with attention
propagating subject information and MLPs promoting the answers. Then, attention
attends to and suppresses previous answer tokens, while MLPs amplify the
suppression signal. Our mechanism is corroborated by extensive experimental
evidence: in addition to using early decoding and causal tracing, we analyze
how components use different tokens by introducing both Token Lens, which
decodes aggregated attention updates from specified tokens, and a knockout
method that analyzes changes in MLP outputs after removing attention to
specified tokens. Overall, we provide new insights into how LMs' internal
components interact with different input tokens to support complex factual
recall. Code is available at
https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.Summary
AI-Generated Summary