MemMamba：重新思考状态空间模型中的记忆模式

摘要

随着数据的爆炸式增长，长序列建模在自然语言处理和生物信息学等任务中变得愈发重要。然而，现有方法在效率和内存之间面临固有的权衡。循环神经网络因梯度消失和爆炸问题而难以扩展。Transformer虽能建模全局依赖关系，却受限于二次方复杂度。近期，如Mamba等选择性状态空间模型展示了O(n)时间复杂度和O(1)递归推理的高效性，但其长程记忆呈指数衰减。本研究通过数学推导和信息论分析，系统揭示了Mamba的记忆衰减机制，解答了一个根本问题：Mamba的长程记忆本质是什么，它如何保留信息？为量化关键信息损失，我们进一步引入了水平-垂直记忆保真度指标，捕捉层内及跨层的信息退化。受人类阅读长文档时提炼并保留关键信息的启发，我们提出了MemMamba，一种新颖的架构框架，它整合了状态摘要机制与跨层跨令牌注意力，在保持线性复杂度的同时缓解了长程遗忘问题。MemMamba在PG19和Passkey Retrieval等长序列基准测试上显著优于现有Mamba变体和Transformer，推理效率提升了48%。理论分析与实证结果均表明，MemMamba在复杂度与记忆的权衡上实现了突破，为超长序列建模提供了新范式。

English

With the explosive growth of data, long-sequence modeling has become increasingly important in tasks such as natural language processing and bioinformatics. However, existing methods face inherent trade-offs between efficiency and memory. Recurrent neural networks suffer from gradient vanishing and explosion, making them hard to scale. Transformers can model global dependencies but are constrained by quadratic complexity. Recently, selective state-space models such as Mamba have demonstrated high efficiency with O(n) time and O(1) recurrent inference, yet their long-range memory decays exponentially. In this work, we conduct mathematical derivations and information-theoretic analysis to systematically uncover the memory decay mechanism of Mamba, answering a fundamental question: what is the nature of Mamba's long-range memory and how does it retain information? To quantify key information loss, we further introduce horizontal-vertical memory fidelity metrics that capture degradation both within and across layers. Inspired by how humans distill and retain salient information when reading long documents, we propose MemMamba, a novel architectural framework that integrates state summarization mechanism together with cross-layer and cross-token attention, which alleviates long-range forgetting while preserving linear complexity. MemMamba achieves significant improvements over existing Mamba variants and Transformers on long-sequence benchmarks such as PG19 and Passkey Retrieval, while delivering a 48% speedup in inference efficiency. Both theoretical analysis and empirical results demonstrate that MemMamba achieves a breakthrough in the complexity-memory trade-off, offering a new paradigm for ultra-long sequence modeling.

MemMamba：重新思考状态空间模型中的记忆模式

MemMamba: Rethinking Memory Patterns in State Space Model

摘要

Support