超越現實:面向長上下文大模型的旋轉位置編碼虛數擴展
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
December 8, 2025
作者: Xiaoran Liu, Yuerong Song, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Zhaoxiang Liu, Shiguo Lian, Ziwei He, Xipeng Qiu
cs.AI
摘要
旋轉式位置編碼(RoPE)已成為大型語言模型中序列順序編碼的標準方法,其透過在複數平面上對查詢向量和鍵向量施加旋轉來實現。然而,標準實現方案僅使用複數點積的實部計算注意力分數,此簡化過程捨棄了包含寶貴相位訊息的虛部,導致可能損失對長上下文依賴建模至關重要的關聯細節。本文提出一種擴展方法,重新整合被捨棄的虛部成分。我們的技術利用完整複數表徵來建立雙分量注意力分數,並從理論與實證層面證明,此方法能透過保留更多位置訊息來增強長上下文依賴的建模能力。此外,在系列長上下文語言建模基準測試中的評估顯示,相較於標準RoPE,我們的方法能持續提升效能,且隨著上下文長度增加,效益愈發顯著。相關程式碼已公開於 https://github.com/OpenMOSS/rope_pp。
English
Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at https://github.com/OpenMOSS/rope_pp.