ChatPaper.aiChatPaper

门控关联记忆:一种用于高效序列建模的并行O(N)架构

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

August 30, 2025
作者: Rishiraj Acharya
cs.AI

摘要

基于自注意力机制的Transformer架构已成为序列建模任务的事实标准。然而,其核心计算原语随序列长度呈二次方增长(O(N^2)),在处理长上下文时形成了显著的瓶颈。本文提出了一种全新的、完全并行的序列建模架构——门控关联记忆(GAM)网络,该架构在序列长度上展现出线性复杂度(O(N))。GAM模块以两条并行路径取代了自注意力层:一条因果卷积路径,用于高效捕捉局部、位置依赖的上下文;另一条并行关联记忆检索机制,用于建模全局、基于内容的模式。这两条路径通过门控机制动态融合,使模型能够灵活地为每个令牌结合局部与全局信息。我们从零开始实现了GAM,并在WikiText-2基准测试中与标准Transformer模型及现代线性时间基线(Mamba)进行了严格的对比分析,同时在TinyStories数据集上与Transformer进行了比较。实验结果表明,GAM在训练速度上始终更快,超越了两个基线,并在所有数据集上实现了更优或具有竞争力的最终验证困惑度,确立了其作为序列建模高效替代方案的潜力。
English
The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating a significant bottleneck for processing long contexts. In this paper, we propose the Gated Associative Memory (GAM) network, a novel, fully parallel architecture for sequence modeling that exhibits linear complexity (O(N)) with respect to sequence length. The GAM block replaces the self-attention layer with two parallel pathways: a causal convolution to efficiently capture local, position-dependent context, and a parallel associative memory retrieval mechanism to model global, content-based patterns. These pathways are dynamically fused using a gating mechanism, allowing the model to flexibly combine local and global information for each token. We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline (Mamba) on the WikiText-2 benchmark, as well as against the Transformer on the TinyStories dataset. Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets, establishing it as a promising and efficient alternative for sequence modeling.
PDF324September 3, 2025