超越现实:面向长上下文大语言模型的旋转位置编码虚数扩展
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
December 8, 2025
作者: Xiaoran Liu, Yuerong Song, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Zhaoxiang Liu, Shiguo Lian, Ziwei He, Xipeng Qiu
cs.AI
摘要
旋转位置编码(RoPE)已成为大语言模型中序列顺序编码的标准方法,其通过对查询向量和键向量在复平面上施加旋转来实现。然而标准实现仅使用复数点积的实部计算注意力分数,这种简化丢弃了包含宝贵相位信息的虚部,导致可能丢失对长上下文依赖建模至关重要的关系细节。本文提出一种扩展方法,重新整合被丢弃的虚部信息。我们的方法利用完整复数表示构建双组分注意力分数,从理论和实验两方面证明该方案能通过保留更多位置信息来增强长上下文依赖建模能力。此外,在系列长上下文语言建模基准上的评估表明,我们的方法相较标准RoPE能持续提升性能,且随着上下文长度增加,改进效果愈加显著。代码已开源于https://github.com/OpenMOSS/rope_pp。
English
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at https://github.com/OpenMOSS/rope_pp.