MemMamba: 状態空間モデルにおけるメモリパターンの再考

要旨

データの爆発的増加に伴い、自然言語処理やバイオインフォマティクスなどのタスクにおいて、長系列モデリングの重要性が高まっています。しかし、既存の手法は効率性とメモリ使用量の間で本質的なトレードオフに直面しています。リカレントニューラルネットワークは勾配消失や爆発の問題を抱えており、スケーリングが困難です。トランスフォーマーはグローバルな依存関係をモデル化できますが、二次的な計算量に制約されます。最近では、Mambaのような選択的状態空間モデルがO(n)の時間計算量とO(1)の再帰的推論で高い効率性を示していますが、長距離メモリが指数関数的に減衰するという課題があります。本研究では、数学的導出と情報理論的分析を通じて、Mambaのメモリ減衰メカニズムを体系的に解明し、Mambaの長距離メモリの本質と情報保持の仕組みという根本的な問いに答えます。主要な情報損失を定量化するため、層内および層間の劣化を捉える水平-垂直メモリ忠実度メトリクスを導入します。人間が長文書を読む際に重要な情報を蒸留し保持する方法に着想を得て、状態要約メカニズムと層間・トークン間アテンションを統合した新しいアーキテクチャフレームワークであるMemMambaを提案します。これにより、線形計算量を維持しつつ長距離の忘却を軽減します。MemMambaは、PG19やPasskey Retrievalなどの長系列ベンチマークにおいて、既存のMambaバリアントやトランスフォーマーを大幅に上回る性能を達成し、推論効率で48%の高速化を実現します。理論分析と実験結果の両方が、MemMambaが計算量とメモリのトレードオフにおいてブレークスルーを達成し、超長系列モデリングの新しいパラダイムを提供することを示しています。

English

With the explosive growth of data, long-sequence modeling has become increasingly important in tasks such as natural language processing and bioinformatics. However, existing methods face inherent trade-offs between efficiency and memory. Recurrent neural networks suffer from gradient vanishing and explosion, making them hard to scale. Transformers can model global dependencies but are constrained by quadratic complexity. Recently, selective state-space models such as Mamba have demonstrated high efficiency with O(n) time and O(1) recurrent inference, yet their long-range memory decays exponentially. In this work, we conduct mathematical derivations and information-theoretic analysis to systematically uncover the memory decay mechanism of Mamba, answering a fundamental question: what is the nature of Mamba's long-range memory and how does it retain information? To quantify key information loss, we further introduce horizontal-vertical memory fidelity metrics that capture degradation both within and across layers. Inspired by how humans distill and retain salient information when reading long documents, we propose MemMamba, a novel architectural framework that integrates state summarization mechanism together with cross-layer and cross-token attention, which alleviates long-range forgetting while preserving linear complexity. MemMamba achieves significant improvements over existing Mamba variants and Transformers on long-sequence benchmarks such as PG19 and Passkey Retrieval, while delivering a 48% speedup in inference efficiency. Both theoretical analysis and empirical results demonstrate that MemMamba achieves a breakthrough in the complexity-memory trade-off, offering a new paradigm for ultra-long sequence modeling.

MemMamba: 状態空間モデルにおけるメモリパターンの再考

MemMamba: Rethinking Memory Patterns in State Space Model

要旨

Support