群組表徵位置編碼
Group Representational Position Encoding
December 8, 2025
作者: Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao
cs.AI
摘要
我們提出GRAPE(群表示位置編碼),這是一個基於群作用的統一位置編碼框架。該框架整合了兩類機制:(i) SO(d) 群中的乘法旋轉(乘法型GRAPE),以及(ii) 源自一般線性群GL中冪么作用的加法logit偏置(加法型GRAPE)。在乘法型GRAPE中,Z中的位置n(或R中的t)通過G(n)=exp(n,ω,L)作用,其中L為R^{d×d}中的二秩斜對稱生成元,產生具有閉式矩陣指數的相對性、組合性、保範映射。當d/2個平面為具有對數均勻譜的標準坐標對時,可精確還原RoPE。通過學習的交換子空間與緊緻非交換混合結構,可將此幾何嚴格擴展至O(d)和O(r d)的每頭計算成本,分別捕獲跨子空間的特徵耦合關係。在加法型GRAPE中,加法logit源自一秩(或低秩)冪么作用,精確還原ALiBi與遺忘變換器(FoX)為特例,同時保持精確的相對律與流式緩存能力。總體而言,GRAPE為長上下文模型中的位置幾何提供了原則性設計空間,將RoPE與ALiBi涵蓋為特例。項目頁面:https://github.com/model-architectures/GRAPE。
English
We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n in Z (or t in R) acts as G(n)=exp(n,ω,L) with a rank-2 skew generator L in R^{d times d}, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(r d) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.