MemMamba: 상태 공간 모델에서의 메모리 패턴 재고

초록

데이터의 폭발적 증가와 함께, 자연어 처리 및 생물정보학과 같은 작업에서 장기간 시퀀스 모델링의 중요성이 점점 더 커지고 있습니다. 그러나 기존 방법들은 효율성과 메모리 간의 본질적인 트레이드오프에 직면해 있습니다. 순환 신경망(RNN)은 그래디언트 소실 및 폭발 문제로 인해 확장하기 어렵습니다. 트랜스포머는 전역 의존성을 모델링할 수 있지만, 2차 복잡도에 의해 제약을 받습니다. 최근 Mamba와 같은 선택적 상태 공간 모델은 O(n) 시간 복잡도와 O(1) 순차적 추론으로 높은 효율성을 보여주었지만, 장기간 메모리가 지수적으로 감소하는 문제가 있습니다. 본 연구에서는 Mamba의 메모리 감소 메커니즘을 체계적으로 밝히기 위해 수학적 유도와 정보 이론적 분석을 수행하여 근본적인 질문에 답합니다: Mamba의 장기간 메모리의 본질은 무엇이며, 어떻게 정보를 유지하는가? 주요 정보 손실을 정량화하기 위해, 우리는 레이어 내부와 레이어 간의 저하를 포착하는 수평-수직 메모리 충실도 지표를 추가로 도입합니다. 인간이 긴 문서를 읽을 때 중요한 정보를 추출하고 유지하는 방식에서 영감을 받아, 우리는 상태 요약 메커니즘과 크로스-레이어 및 크로스-토큰 어텐션을 통합한 새로운 아키텍처 프레임워크인 MemMamba를 제안합니다. 이는 선형 복잡도를 유지하면서 장기간 망각 문제를 완화합니다. MemMamba는 PG19 및 패스키 검색과 같은 장기간 시퀀스 벤치마크에서 기존 Mamba 변형 및 트랜스포머 대비 상당한 개선을 달성하며, 추론 효율성에서 48%의 속도 향상을 제공합니다. 이론적 분석과 실험 결과 모두 MemMamba가 복잡도-메모리 트레이드오프에서의 돌파구를 달성하며, 초장기 시퀀스 모델링을 위한 새로운 패러다임을 제시함을 보여줍니다.

English

With the explosive growth of data, long-sequence modeling has become increasingly important in tasks such as natural language processing and bioinformatics. However, existing methods face inherent trade-offs between efficiency and memory. Recurrent neural networks suffer from gradient vanishing and explosion, making them hard to scale. Transformers can model global dependencies but are constrained by quadratic complexity. Recently, selective state-space models such as Mamba have demonstrated high efficiency with O(n) time and O(1) recurrent inference, yet their long-range memory decays exponentially. In this work, we conduct mathematical derivations and information-theoretic analysis to systematically uncover the memory decay mechanism of Mamba, answering a fundamental question: what is the nature of Mamba's long-range memory and how does it retain information? To quantify key information loss, we further introduce horizontal-vertical memory fidelity metrics that capture degradation both within and across layers. Inspired by how humans distill and retain salient information when reading long documents, we propose MemMamba, a novel architectural framework that integrates state summarization mechanism together with cross-layer and cross-token attention, which alleviates long-range forgetting while preserving linear complexity. MemMamba achieves significant improvements over existing Mamba variants and Transformers on long-sequence benchmarks such as PG19 and Passkey Retrieval, while delivering a 48% speedup in inference efficiency. Both theoretical analysis and empirical results demonstrate that MemMamba achieves a breakthrough in the complexity-memory trade-off, offering a new paradigm for ultra-long sequence modeling.

MemMamba: 상태 공간 모델에서의 메모리 패턴 재고

MemMamba: Rethinking Memory Patterns in State Space Model

초록

Support