SiMBA:簡化的基於瑪巴的視覺和多變量時間序列架構
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
March 22, 2024
作者: Badri N. Patro, Vijay S. Agneeswaran
cs.AI
摘要
Transformer廣泛採用了注意力網絡進行序列混合和MLPs進行通道混合,在各個領域取得了突破性進展。然而,最近的文獻突出了關於注意力網絡的問題,包括對輸入序列長度的低歸納偏差和二次複雜度。像S4和其他SSM(如Hippo、Global Convolutions、liquid S4、LRU、Mega和Mamba)這樣的狀態空間模型已經出現,以解決上述問題,幫助處理更長的序列長度。Mamba作為最先進的SSM,當擴展到大型計算機視覺數據集時存在穩定性問題。我們提出了SiMBA,一種新的架構,引入了Einstein FFT(EinFFT)進行通道建模,通過特定的特徵值計算使用Mamba塊進行序列建模。通過圖像和時間序列基準的廣泛性能研究表明,SiMBA優於現有的SSM,縮小了與最先進的Transformer之間的性能差距。值得注意的是,SiMBA在ImageNet和轉移學習基準以及Stanford Car和Flower等任務學習基準以及七個時間序列基準數據集上確立了自己作為最新的SSM。項目頁面可在此網站上找到:https://github.com/badripatro/Simba。
English
Transformers have widely adopted attention networks for sequence mixing and
MLPs for channel mixing, playing a pivotal role in achieving breakthroughs
across domains. However, recent literature highlights issues with attention
networks, including low inductive bias and quadratic complexity concerning
input sequence length. State Space Models (SSMs) like S4 and others (Hippo,
Global Convolutions, liquid S4, LRU, Mega, and Mamba), have emerged to address
the above issues to help handle longer sequence lengths. Mamba, while being the
state-of-the-art SSM, has a stability issue when scaled to large networks for
computer vision datasets. We propose SiMBA, a new architecture that introduces
Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations
and uses the Mamba block for sequence modeling. Extensive performance studies
across image and time-series benchmarks demonstrate that SiMBA outperforms
existing SSMs, bridging the performance gap with state-of-the-art transformers.
Notably, SiMBA establishes itself as the new state-of-the-art SSM on ImageNet
and transfer learning benchmarks such as Stanford Car and Flower as well as
task learning benchmarks as well as seven time series benchmark datasets. The
project page is available on this website
~https://github.com/badripatro/Simba.Summary
AI-Generated Summary