MemMamba: Ripensare i Modelli di Memoria negli Spazi di Stato

Abstract

Con la crescita esplosiva dei dati, la modellazione di sequenze lunghe è diventata sempre più importante in compiti come l'elaborazione del linguaggio naturale e la bioinformatica. Tuttavia, i metodi esistenti affrontano compromessi intrinseci tra efficienza e memoria. Le reti neurali ricorrenti soffrono di problemi di scomparsa ed esplosione del gradiente, rendendole difficili da scalare. I Transformer possono modellare dipendenze globali ma sono limitati dalla complessità quadratica. Recentemente, modelli selettivi a spazio di stati come Mamba hanno dimostrato un'elevata efficienza con complessità temporale O(n) e inferenza ricorrente O(1), ma la loro memoria a lungo raggio decade esponenzialmente. In questo lavoro, conduciamo derivazioni matematiche e analisi teorico-informatiche per scoprire sistematicamente il meccanismo di decadimento della memoria di Mamba, rispondendo a una domanda fondamentale: qual è la natura della memoria a lungo raggio di Mamba e come conserva le informazioni? Per quantificare la perdita di informazioni chiave, introduciamo ulteriormente metriche di fedeltà della memoria orizzontale-verticale che catturano il degrado sia all'interno che tra i livelli. Ispirati da come gli esseri umani distillano e conservano informazioni salienti durante la lettura di documenti lunghi, proponiamo MemMamba, un nuovo framework architetturale che integra un meccanismo di riepilogo degli stati insieme a un'attenzione incrociata tra livelli e token, che allevia la dimenticanza a lungo raggio preservando la complessità lineare. MemMamba ottiene miglioramenti significativi rispetto alle varianti esistenti di Mamba e ai Transformer su benchmark di sequenze lunghe come PG19 e Passkey Retrieval, offrendo un'accelerazione del 48% nell'efficienza di inferenza. Sia l'analisi teorica che i risultati empirici dimostrano che MemMamba rappresenta una svolta nel compromesso complessità-memoria, offrendo un nuovo paradigma per la modellazione di sequenze ultra-lunghe.

English

With the explosive growth of data, long-sequence modeling has become increasingly important in tasks such as natural language processing and bioinformatics. However, existing methods face inherent trade-offs between efficiency and memory. Recurrent neural networks suffer from gradient vanishing and explosion, making them hard to scale. Transformers can model global dependencies but are constrained by quadratic complexity. Recently, selective state-space models such as Mamba have demonstrated high efficiency with O(n) time and O(1) recurrent inference, yet their long-range memory decays exponentially. In this work, we conduct mathematical derivations and information-theoretic analysis to systematically uncover the memory decay mechanism of Mamba, answering a fundamental question: what is the nature of Mamba's long-range memory and how does it retain information? To quantify key information loss, we further introduce horizontal-vertical memory fidelity metrics that capture degradation both within and across layers. Inspired by how humans distill and retain salient information when reading long documents, we propose MemMamba, a novel architectural framework that integrates state summarization mechanism together with cross-layer and cross-token attention, which alleviates long-range forgetting while preserving linear complexity. MemMamba achieves significant improvements over existing Mamba variants and Transformers on long-sequence benchmarks such as PG19 and Passkey Retrieval, while delivering a 48% speedup in inference efficiency. Both theoretical analysis and empirical results demonstrate that MemMamba achieves a breakthrough in the complexity-memory trade-off, offering a new paradigm for ultra-long sequence modeling.

MemMamba: Ripensare i Modelli di Memoria negli Spazi di Stato

MemMamba: Rethinking Memory Patterns in State Space Model

Abstract

Support