Transformer 是 SSMs:通用模型和高效演算法通過結構化狀態空間對偶。
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
May 31, 2024
作者: Tri Dao, Albert Gu
cs.AI
摘要
儘管Transformer一直是深度學習在語言建模方面取得成功的主要架構,但最近顯示出狀態空間模型(SSMs)如Mamba在小到中等規模上能夠與Transformer匹敵甚至超越。我們展示這些模型家族實際上非常相關,並發展了一個豐富的理論連接框架,將SSMs與注意力變體之間的各種分解通過一類經過深入研究的結構半可分離矩陣相連。我們的狀態空間對偶(SSD)框架使我們能夠設計一種新的架構(Mamba-2),其核心層是Mamba選擇性SSM的改進版本,速度提高2-8倍,同時在語言建模方面繼續與Transformer保持競爭力。
English
While Transformers have been the main architecture behind deep learning's
success in language modeling, state-space models (SSMs) such as Mamba have
recently been shown to match or outperform Transformers at small to medium
scale. We show that these families of models are actually quite closely
related, and develop a rich framework of theoretical connections between SSMs
and variants of attention, connected through various decompositions of a
well-studied class of structured semiseparable matrices. Our state space
duality (SSD) framework allows us to design a new architecture (Mamba-2) whose
core layer is an a refinement of Mamba's selective SSM that is 2-8X faster,
while continuing to be competitive with Transformers on language modeling.Summary
AI-Generated Summary