ReMamba：為 Mamba 配備有效的長序列建模

摘要

儘管 Mamba 架構在短文本自然語言處理（NLP）任務中展現出卓越的推論效率和競爭性表現，實證證據顯示相較於基於 transformer 的模型，其理解長文本的能力有限。本研究探討 Mamba 模型在處理長文本時的效率問題，並提出 ReMamba，以增強 Mamba 理解長文本的能力。ReMamba 在兩階段的重新轉發過程中融入選擇性壓縮和適應技術，幾乎不增加推論成本。在 LongBench 和 L-Eval 基準測試中的實驗結果顯示，ReMamba 的效能優異，分別比基準提高了 3.2 和 1.6 分，幾乎達到與同等大小 transformer 模型相當的表現水準。

English

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.

ReMamba：為 Mamba 配備有效的長序列建模

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

摘要

Support