ReMamba:為 Mamba 配備有效的長序列建模
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
August 28, 2024
作者: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao
cs.AI
摘要
儘管 Mamba 架構在短文本自然語言處理(NLP)任務中展現出卓越的推論效率和競爭性表現,實證證據顯示相較於基於 transformer 的模型,其理解長文本的能力有限。本研究探討 Mamba 模型在處理長文本時的效率問題,並提出 ReMamba,以增強 Mamba 理解長文本的能力。ReMamba 在兩階段的重新轉發過程中融入選擇性壓縮和適應技術,幾乎不增加推論成本。在 LongBench 和 L-Eval 基準測試中的實驗結果顯示,ReMamba 的效能優異,分別比基準提高了 3.2 和 1.6 分,幾乎達到與同等大小 transformer 模型相當的表現水準。
English
While the Mamba architecture demonstrates superior inference efficiency and
competitive performance on short-context natural language processing (NLP)
tasks, empirical evidence suggests its capacity to comprehend long contexts is
limited compared to transformer-based models. In this study, we investigate the
long-context efficiency issues of the Mamba models and propose ReMamba, which
enhances Mamba's ability to comprehend long contexts. ReMamba incorporates
selective compression and adaptation techniques within a two-stage re-forward
process, incurring minimal additional inference costs overhead. Experimental
results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy,
improving over the baselines by 3.2 and 1.6 points, respectively, and attaining
performance almost on par with same-size transformer models.