2Mamba2Furieus: Lineair in Complexiteit, Competitief in Nauwkeurigheid

Samenvatting

Lineaire aandachtstransformers zijn een krachtig alternatief geworden voor softmax-aandacht vanwege hun efficiëntie. Lineaire aandacht is echter over het algemeen minder expressief en resulteert in een verminderde nauwkeurigheid vergeleken met softmax-aandacht. Om het nauwkeurigheidsverschil tussen softmax-aandacht en lineaire aandacht te overbruggen, manipuleren we Mamba-2, een zeer krachtige variant van lineaire aandacht. We vereenvoudigen eerst Mamba-2 tot zijn meest fundamentele en belangrijke componenten, waarbij we evalueren welke specifieke keuzes het meest nauwkeurig maken. Vanuit deze vereenvoudigde Mamba-variant (Mamba-2S) verbeteren we het A-masker en verhogen we de orde van de verborgen toestand, wat resulteert in een methode, die we 2Mamba noemen, die bijna even nauwkeurig is als softmax-aandacht, maar veel geheugenefficiënter voor lange contextlengtes. We onderzoeken ook elementen van Mamba-2 die helpen de nauwkeurigheid van softmax-aandacht te overtreffen. Code voor al onze experimenten is beschikbaar.

English

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

2Mamba2Furieus: Lineair in Complexiteit, Competitief in Nauwkeurigheid

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Samenvatting

Support