ReMamba: 効果的な長シーケンスモデリングを備えたマンバ

要旨

Mambaアーキテクチャは、短いコンテキストの自然言語処理（NLP）タスクにおいて優れた推論効率と競争力のパフォーマンスを示していますが、実証的な証拠から、長いコンテキストを理解する能力は、transformerベースのモデルと比較して限られているとされています。本研究では、Mambaモデルの長いコンテキストの効率性の問題を調査し、長いコンテキストを理解する能力を向上させるReMambaを提案します。ReMambaは、選択的な圧縮と適応技術を、追加の推論コストを最小限に抑える2段階の再転送プロセス内に組み込んでいます。LongBenchとL-Evalのベンチマークでの実験結果は、ReMambaの効果を示し、それぞれベースラインより3.2ポイントと1.6ポイント向上し、同じサイズのtransformerモデルとほぼ同等のパフォーマンスを達成しています。

English

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.

ReMamba: 効果的な長シーケンスモデリングを備えたマンバ

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

要旨

Support