SiMBA: ビジョンおよび多変量時系列のための簡素化されたMambaベースのアーキテクチャ

要旨

Transformerは、シーケンス混合のためのアテンションネットワークとチャネル混合のためのMLPを広く採用し、さまざまな領域でのブレークスルーを達成する上で重要な役割を果たしてきました。しかし、最近の研究では、アテンションネットワークの低い帰納的バイアスや入力シーケンス長に対する二次的な複雑性といった問題が指摘されています。これらの問題に対処し、より長いシーケンス長を扱うために、S4やその他のState Space Models（SSMs）（Hippo、Global Convolutions、liquid S4、LRU、Mega、Mambaなど）が登場しました。Mambaは最先端のSSMですが、コンピュータビジョンのデータセットに対して大規模なネットワークにスケールアップする際に安定性の問題があります。我々は、SiMBAという新しいアーキテクチャを提案します。SiMBAは、特定の固有値計算によるチャネルモデリングのためにEinstein FFT（EinFFT）を導入し、シーケンスモデリングのためにMambaブロックを使用します。画像および時系列ベンチマークでの広範な性能研究により、SiMBAが既存のSSMを上回り、最先端のTransformerとの性能差を埋めることが示されています。特に、SiMBAはImageNetやStanford Car、Flowerなどの転移学習ベンチマーク、および7つの時系列ベンチマークデータセットにおいて、新しい最先端のSSMとしての地位を確立しました。プロジェクトページは以下のウェブサイトで利用可能です：~https://github.com/badripatro/Simba。

English

Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains. However, recent literature highlights issues with attention networks, including low inductive bias and quadratic complexity concerning input sequence length. State Space Models (SSMs) like S4 and others (Hippo, Global Convolutions, liquid S4, LRU, Mega, and Mamba), have emerged to address the above issues to help handle longer sequence lengths. Mamba, while being the state-of-the-art SSM, has a stability issue when scaled to large networks for computer vision datasets. We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling. Extensive performance studies across image and time-series benchmarks demonstrate that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers. Notably, SiMBA establishes itself as the new state-of-the-art SSM on ImageNet and transfer learning benchmarks such as Stanford Car and Flower as well as task learning benchmarks as well as seven time series benchmark datasets. The project page is available on this website ~https://github.com/badripatro/Simba.

SiMBA: ビジョンおよび多変量時系列のための簡素化されたMambaベースのアーキテクチャ

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

要旨

Summary

Support

Support