CoPE:截断式旋转位置编码——面向长上下文大模型的可扩展性免费增益方案
CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs
February 5, 2026
作者: Haoran Li, Sucheng Ren, Alan Yuille, Feng Wang
cs.AI
摘要
旋转位置编码(RoPE)是大语言模型上下文扩展的核心组件。尽管已有多种方法被提出用于适配更长上下文的RoPE,但其指导原则通常可归为两类:(1)分布外泛化,通过缩放RoPE频率以适应未见过位置;(2)语义建模,主张基于RoPE计算的注意力分数应始终优先关注语义相似的标记。本研究通过极简干预策略CoPE统一了这两个看似独立的目标:即对RoPE低频分量进行软截断。CoPE不仅能消除分布外异常值并优化语义信号,还可避免硬截断引发的频谱泄漏。大量实验表明,仅需对RoPE施加我们的软截断策略,即可在长达256k的上下文范围内获得持续提升的性能增益,这验证了我们的理论分析,并使CoPE成为长度泛化领域的最新标杆。相关代码、数据及模型已开源:https://github.com/hrlics/CoPE。
English
Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoPE frequencies to accommodate unseen positions, and (2) Semantic Modeling, which posits that the attention scores computed with RoPE should always prioritize semantically similar tokens. In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely CoPE: soft clipping lowfrequency components of RoPE. CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping. Extensive experiments demonstrate that simply applying our soft clipping strategy to RoPE yields significant performance gains that scale up to 256k context length, validating our theoretical analysis and establishing CoPE as a new state-of-the-art for length generalization. Our code, data, and models are available at https://github.com/hrlics/CoPE.