ChatPaper.aiChatPaper

解构注意力机制中的斜线模式:RoPE的作用探析

Demystifying the Slash Pattern in Attention: The Role of RoPE

January 13, 2026
作者: Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Pang, Aixin Sun, Zhuoran Yang
cs.AI

摘要

大型语言模型(LLMs)常表现出斜线注意力模式,即注意力分数会沿着某个偏移量Δ对应的第Δ条次对角线集中分布。这种模式在跨令牌信息传递中起着关键作用,但其形成机制尚不明确。本文从实证与理论双重视角揭示了这类斜线主导注意力头(SDHs)的涌现机制。首先,通过分析开源LLMs,我们发现SDHs是模型固有特性,并能泛化至分布外提示。为解释其内在涌现规律,我们分析了共同决定注意力分数的查询向量、键向量及旋转位置编码(RoPE)。实证研究表明SDHs具有两个特征条件:(1)查询向量与键向量几乎呈秩一特性;(2)RoPE由中高频分量主导。在此条件下,各令牌的查询向量与键向量近乎一致,而RoPE中高频分量间的相互作用催生了SDHs。除实证证据外,我们通过将上述条件形式化为建模假设,从理论上证明了这些条件足以保证SDHs的涌现。特别地,我们分析了满足这些条件的浅层Transformer模型在RoPE作用下的训练动态,并证明经梯度下降训练的模型必然呈现SDHs特性,且该特性可泛化至分布外提示。
English
Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the Δ-th sub-diagonal for some offset Δ. These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash-Dominant Heads (SDHs) from both empirical and theoretical perspectives. First, by analyzing open-source LLMs, we find that SDHs are intrinsic to models and generalize to out-of-distribution prompts. To explain the intrinsic emergence, we analyze the queries, keys, and Rotary Position Embedding (RoPE), which jointly determine attention scores. Our empirical analysis reveals two characteristic conditions of SDHs: (1) Queries and keys are almost rank-one, and (2) RoPE is dominated by medium- and high-frequency components. Under these conditions, queries and keys are nearly identical across tokens, and interactions between medium- and high-frequency components of RoPE give rise to SDHs. Beyond empirical evidence, we theoretically show that these conditions are sufficient to ensure the emergence of SDHs by formalizing them as our modeling assumptions. Particularly, we analyze the training dynamics of a shallow Transformer equipped with RoPE under these conditions, and prove that models trained via gradient descent exhibit SDHs. The SDHs generalize to out-of-distribution prompts.
PDF11January 17, 2026