2Mamba2Furious:线性复杂度,精准度媲美
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy
February 19, 2026
作者: Gabriel Mongaras, Eric C. Larson
cs.AI
摘要
线性注意力Transformer因其高效性已成为softmax注意力的有力替代方案。然而,与softmax注意力相比,线性注意力往往表达能力较弱,导致准确性下降。为弥合二者间的精度差距,我们对当前性能强劲的线性注意力变体Mamba-2进行改造。首先将其简化为最核心的组成部分,通过评估确定哪些具体设计选择对其精度贡献最大。基于此简化版本(Mamba-2S),我们改进了A-掩码结构并提升隐藏状态阶数,最终提出名为2Mamba的新方法——该方法在长上下文场景下不仅能保持接近softmax注意力的精度,还具有更高的内存效率。我们还探究了使Mamba-2能够超越softmax注意力精度的关键要素。所有实验均附代码实现。
English
Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments