ChatPaper.aiChatPaper

SeqPE:具備序列位置編碼的Transformer

SeqPE: Transformer with Sequential Position Encoding

June 16, 2025
作者: Huyang Li, Yahui Liu, Hongyu Sun, Deng Cai, Leyang Cui, Wei Bi, Peilin Zhao, Taro Watanabe
cs.AI

摘要

由於Transformer中的自注意力層在設計上具有排列不變性,因此必須顯式地引入位置編碼以實現空間理解。然而,傳統可學習位置嵌入(PEs)中使用的固定大小查找表限制了在預訓練序列長度之外的推斷能力。專家設計的方法如ALiBi和RoPE雖緩解了這一限制,但在適應新模態時需要大量修改,凸顯了適應性和可擴展性方面的根本挑戰。在本研究中,我們提出了SeqPE,這是一個統一且完全可學習的位置編碼框架,它將每個n維位置索引表示為符號序列,並採用輕量級序列位置編碼器以端到端的方式學習其嵌入。為了規範SeqPE的嵌入空間,我們引入了兩個互補的目標:一個對比目標,使嵌入距離與預定義的位置距離函數對齊;以及一個知識蒸餾損失,將分佈外位置嵌入錨定到分佈內教師表示,進一步提升推斷性能。在語言建模、長上下文問答和二維圖像分類等任務上的實驗表明,SeqPE不僅在困惑度、精確匹配(EM)和準確率上超越了強基線——特別是在上下文長度推斷下——而且能夠無需手動架構重設計即可無縫泛化到多維輸入。我們在https://github.com/ghrua/seqpe上發布了代碼、數據和檢查點。
English
Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position embeddings (PEs) limit extrapolation capabilities beyond pre-trained sequence lengths. Expert-designed methods such as ALiBi and RoPE, mitigate this limitation but demand extensive modifications for adapting to new modalities, underscoring fundamental challenges in adaptability and scalability. In this work, we present SeqPE, a unified and fully learnable position encoding framework that represents each n-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings in an end-to-end manner. To regularize SeqPE's embedding space, we introduce two complementary objectives: a contrastive objective that aligns embedding distances with a predefined position-distance function, and a knowledge distillation loss that anchors out-of-distribution position embeddings to in-distribution teacher representations, further enhancing extrapolation performance. Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM), and accuracy--particularly under context length extrapolation--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign. We release our code, data, and checkpoints at https://github.com/ghrua/seqpe.
PDF22June 17, 2025