ChatPaper.aiChatPaper

揭秘注意力机制中的斜线模式:RoPE的作用

Demystifying the Slash Pattern in Attention: The Role of RoPE

January 13, 2026
作者: Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Pang, Aixin Sun, Zhuoran Yang
cs.AI

摘要

大型语言模型(LLMs)常表现出斜线注意力模式,即注意力分数会沿某个偏移量Δ的Δ次对角线上集中分布。这类模式在跨词元信息传递中起关键作用,但其形成机制尚未明晰。本文从实证与理论双视角揭示斜线主导注意力头(SDHs)的涌现机制。首先,通过分析开源LLMs,我们发现SDHs是模型固有特性,且能泛化至分布外提示。为解释其内在涌现规律,我们分析了共同决定注意力分数的查询向量、键向量及旋转位置编码(RoPE)。实证研究表明SDHs具有两大特征条件:(1)查询向量与键向量近似满足秩一特性;(2)RoPE由中高频分量主导。在此条件下,各词元的查询向量与键向量近乎一致,而RoPE中高频分量间的相互作用催生了SDHs。除实证证据外,我们通过将上述条件形式化为建模假设,从理论上证明这些条件足以保证SDHs的涌现。特别地,我们分析了满足这些条件的浅层Transformer模型在RoPE作用下的训练动态,证明基于梯度下降训练的模型会呈现SDHs特性,且该特性可泛化至分布外提示。
English
Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the Δ-th sub-diagonal for some offset Δ. These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash-Dominant Heads (SDHs) from both empirical and theoretical perspectives. First, by analyzing open-source LLMs, we find that SDHs are intrinsic to models and generalize to out-of-distribution prompts. To explain the intrinsic emergence, we analyze the queries, keys, and Rotary Position Embedding (RoPE), which jointly determine attention scores. Our empirical analysis reveals two characteristic conditions of SDHs: (1) Queries and keys are almost rank-one, and (2) RoPE is dominated by medium- and high-frequency components. Under these conditions, queries and keys are nearly identical across tokens, and interactions between medium- and high-frequency components of RoPE give rise to SDHs. Beyond empirical evidence, we theoretically show that these conditions are sufficient to ensure the emergence of SDHs by formalizing them as our modeling assumptions. Particularly, we analyze the training dynamics of a shallow Transformer equipped with RoPE under these conditions, and prove that models trained via gradient descent exhibit SDHs. The SDHs generalize to out-of-distribution prompts.
PDF11January 17, 2026