ChatPaper.aiChatPaper

DoPE:去噪旋转位置编码

DoPE: Denoising Rotary Position Embedding

November 12, 2025
作者: Jing Xiong, Liyang Fan, Hui Shen, Zunhai Su, Min Yang, Lingpeng Kong, Ngai Wong
cs.AI

摘要

在Transformer模型中,旋转位置嵌入(RoPE)存在固有的局限性,这些限制削弱了长度外推的能力。我们将带有位置编码的注意力图重新解读为一种含噪特征图,并提出了一种无需训练的降噪位置编码方法(DoPE),该方法基于截断矩阵熵来检测特征图中的异常频带。利用特征图的噪声特性,我们进一步通过无参数的高斯分布对其进行重参数化,以实现稳健的外推。我们的方法从理论上揭示了注意力下沉现象的根本原因及其与截断矩阵熵之间的联系。在“大海捞针”任务和多样本上下文学习任务上的实验表明,DoPE在扩展上下文(高达64K个标记)中显著提高了检索准确性和推理稳定性。结果表明,位置嵌入的降噪策略有效缓解了注意力下沉,恢复了平衡的注意力模式,为提升长度泛化能力提供了一个简单而强大的解决方案。我们的项目页面是:https://The-physical-picture-of-LLMs.github.io。
English
Rotary Position Embedding (RoPE) in Transformer models has inherent limits that weaken length extrapolation. We reinterpret the attention map with positional encoding as a noisy feature map, and propose Denoising Positional Encoding (DoPE), a training-free method based on truncated matrix entropy to detect outlier frequency bands in the feature map. Leveraging the noise characteristics of the feature map, we further reparameterize it with a parameter-free Gaussian distribution to achieve robust extrapolation. Our method theoretically reveals the underlying cause of the attention sink phenomenon and its connection to truncated matrix entropy. Experiments on needle-in-a-haystack and many-shot in-context learning tasks demonstrate that DoPE significantly improves retrieval accuracy and reasoning stability across extended contexts (up to 64K tokens). The results show that the denoising strategy for positional embeddings effectively mitigates attention sinks and restores balanced attention patterns, providing a simple yet powerful solution for improving length generalization. Our project page is Project: https://The-physical-picture-of-LLMs.github.io
PDF607November 18, 2025