ChatPaper.aiChatPaper

ReMamba:为Mamba配备有效的长序列建模

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

August 28, 2024
作者: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao
cs.AI

摘要

尽管曼巴架构在短文本自然语言处理(NLP)任务中展现出卓越的推理效率和竞争性能,但实证证据表明,与基于Transformer的模型相比,其理解长文本的能力受到限制。在本研究中,我们调查了曼巴模型在处理长文本时遇到的效率问题,并提出了ReMamba,它增强了曼巴理解长文本的能力。ReMamba在两阶段的重新前向过程中结合了选择性压缩和适应技术,几乎不增加额外的推理成本开销。在LongBench和L-Eval基准测试上的实验结果表明,ReMamba的有效性,分别比基线提高了3.2和1.6个点,并且几乎达到了同等规模Transformer模型的性能水平。
English
While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.

Summary

AI-Generated Summary

PDF122November 16, 2024