Transformer中的注意力匯聚現象:利用方式、解讀視角與緩解策略綜述
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
April 11, 2026
作者: Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Keyu Fan, Weihao Ye, Jing Xiong, Hui Shen, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Yulei Qian, Yuchen Xie, Ngai Wong
cs.AI
摘要
作為現代機器學習的基礎架構,變換器已推動人工智慧各領域取得顯著進展。儘管具有變革性影響,各類變換器普遍存在一個持續性挑戰——注意力沉澱現象,即過多注意力資源被集中於少量特定但無信息價值的標記上。該現象不僅使模型可解釋性複雜化,更顯著影響訓練與推論動態,並加劇幻覺等問題。近年來,學界已投入大量研究以理解並利用注意力沉澱現象。然而,目前仍缺乏能系統性整合相關研究並為未來發展提供指引的綜述性文獻。為填補此空白,我們提出首篇注意力沉澱專題綜述,圍繞定義當前研究版圖的三個關鍵維度展開:基礎應用、機理解析與策略緩解。本研究通過釐清核心概念、引領研究者洞悉領域發展脈絡與趨勢作出關鍵貢獻。我們期望本綜述能成為權威參考資源,助力研究與實踐者在現有變換器範式內有效管理注意力沉澱現象,同時為新一代變換器的創新突破啟發思路。本文相關論文清單持續更新於 https://github.com/ZunhaiSu/Awesome-Attention-Sink。
English
As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.