ChatPaper.aiChatPaper

Transformer中的注意力汇聚现象:利用、解读与缓解策略综述

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

April 11, 2026
作者: Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Keyu Fan, Weihao Ye, Jing Xiong, Hui Shen, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Yulei Qian, Yuchen Xie, Ngai Wong
cs.AI

摘要

作为现代机器学习的基础架构,Transformer模型推动了人工智能各领域的显著进展。尽管具有变革性影响,各类Transformer模型始终面临一个共同挑战——注意力汇聚(Attention Sink,AS)现象,即模型会将不成比例的注意力集中在少数特定但信息贫乏的标记上。AS现象不仅增加了模型可解释性的复杂度,显著影响训练与推理的动态过程,还会加剧幻觉生成等问题。近年来,学界已投入大量研究来理解与利用AS现象。然而,目前仍缺乏系统整合AS相关研究并为未来发展方向提供指引的综合性综述。为填补这一空白,我们首次提出围绕三大核心维度构建的AS研究综述:基础应用、机理阐释与策略缓解。本研究通过厘清关键概念、梳理领域演进脉络与发展趋势,为研究者提供重要参考。我们期望该综述能成为权威资源,助力研究者在当前Transformer范式下有效应对AS现象,同时为新一代Transformer的创新突破提供启示。本文相关论文列表详见:https://github.com/ZunhaiSu/Awesome-Attention-Sink。
English
As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.
PDF552April 15, 2026