2Mamba2Furious：线性复杂度，精准度媲美

摘要

线性注意力Transformer因其高效性已成为softmax注意力的有力替代方案。然而，与softmax注意力相比，线性注意力的表达能力往往较弱，会导致精度下降。为弥合二者间的精度差距，我们对当前性能强劲的线性注意力变体Mamba-2进行改造。我们首先将Mamba-2简化为最核心的组件，通过评估确定其实现高精度的关键设计。基于此简化版本（Mamba-2S），我们改进了A-掩码结构并提升隐藏状态阶数，最终提出名为2Mamba的新方法——该方法在长上下文场景下内存效率显著提升的同时，几乎达到与softmax注意力相当的精度。我们还探究了使Mamba-2能够超越softmax注意力精度的关键要素。所有实验代码均已公开。

English

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

2Mamba2Furious：线性复杂度，精准度媲美

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

摘要

Support