ChatPaper.aiChatPaper

群组表示位置编码

Group Representational Position Encoding

December 8, 2025
作者: Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao
cs.AI

摘要

我们提出GRAPE(群表征位置编码),一种基于群作用的统一位置编码框架。该框架整合了两类机制:(i) SO(d)群中的乘法旋转(乘法GRAPE),(ii) 一般线性群GL中单极作用产生的加性逻辑偏置(加性GRAPE)。在乘法GRAPE中,Z中的位置n(或R中的t)通过G(n)=exp(n,ω,L)作用,其中L是R^{d×d}中的二阶斜对称生成元,生成具有闭式矩阵指数的相对、组合、保范映射。当d/2个平面为具有对数均匀谱的规范坐标对时,可精确还原RoPE。通过学习可交换子空间和紧致非交换混合,该几何结构被严格扩展至分别以每头O(d)和O(r d)成本捕获跨子空间特征耦合。在加性GRAPE中,加性逻辑值产生于一阶(或低阶)单极作用,精确还原ALiBi和遗忘变换器(FoX)作为特例,同时保持精确的相对规律和流式缓存能力。总体而言,GRAPE为长上下文模型中的位置几何提供了原则性设计空间,将RoPE和ALiBi囊括为特例。项目页面:https://github.com/model-architectures/GRAPE。
English
We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n in Z (or t in R) acts as G(n)=exp(n,ω,L) with a rank-2 skew generator L in R^{d times d}, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(r d) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.
PDF32December 10, 2025