注意力模式为何存在:一种统一的时间视角分析
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
January 29, 2026
作者: Qingyue Yang, Jie Wang, Xing Li, Yinqi Bai, Xialiang Tong, Huiling Zhen, Jianye Hao, Mingxuan Yuan, Bin Li
cs.AI
摘要
注意力模式在大型语言模型(LLMs)的训练与推理过程中具有关键作用。现有研究虽已识别出检索头、汇聚头和对角线轨迹等独立模式,但这些观察仍呈碎片化状态,缺乏统一的理论解释。为弥补这一空白,我们提出时序注意力模式可预测性分析框架(TAPPA),该统一框架通过从时序连续性的角度分析注意力机制的数学表述,从而解释多样化的注意力模式。TAPPA不仅深化了对注意力行为的理解,还为推理加速方法提供了指导。具体而言,TAPPA将注意力模式表征为具有明确规律的可预测模式与呈现有效随机性的不可预测模式。我们的分析进一步揭示,这种区分可通过查询向量沿时间维度的自相似度来解释。聚焦于可预测模式,我们通过查询向量、键向量与旋转位置编码(RoPE)的联合作用,对三种典型案例进行了详细的数学分析。通过将TAPPA的洞察应用于KV缓存压缩和LLM剪枝任务,我们验证了该框架的有效性。在这些任务中,基于TAPPA设计的简易评估指标均能持续提升基线方法的性能。代码已开源:https://github.com/MIRALab-USTC/LLM-TAPPA。
English
Attention patterns play a crucial role in both training and inference of large language models (LLMs). Prior works have identified individual patterns such as retrieval heads, sink heads, and diagonal traces, yet these observations remain fragmented and lack a unifying explanation. To bridge this gap, we introduce Temporal Attention Pattern Predictability Analysis (TAPPA), a unifying framework that explains diverse attention patterns by analyzing their underlying mathematical formulations from a temporally continuous perspective. TAPPA both deepens the understanding of attention behavior and guides inference acceleration approaches. Specifically, TAPPA characterizes attention patterns as predictable patterns with clear regularities and unpredictable patterns that appear effectively random. Our analysis further reveals that this distinction can be explained by the degree of query self-similarity along the temporal dimension. Focusing on the predictable patterns, we further provide a detailed mathematical analysis of three representative cases through the joint effect of queries, keys, and Rotary Positional Embeddings (RoPE). We validate TAPPA by applying its insights to KV cache compression and LLM pruning tasks. Across these tasks, a simple metric motivated by TAPPA consistently improves performance over baseline methods. The code is available at https://github.com/MIRALab-USTC/LLM-TAPPA.