Transformer是SSM:广义模型和高效算法的代表,通过结构化状态空间对偶实现。
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
May 31, 2024
作者: Tri Dao, Albert Gu
cs.AI
摘要
尽管Transformer一直是深度学习在语言建模方面取得成功的主要架构,但最近已经展示了状态空间模型(SSMs)如Mamba在小到中等规模上可以与Transformer匹敌甚至胜过它。我们展示了这些模型族实际上是非常密切相关的,并且建立了一个丰富的理论连接框架,将SSMs与注意力的各种变体通过对一个经过充分研究的结构化半可分解矩阵类的各种分解进行连接起来。我们的状态空间对偶(SSD)框架使我们能够设计一个新的架构(Mamba-2),其核心层是Mamba选择性SSM的改进版本,速度提高了2-8倍,同时在语言建模方面继续与Transformer保持竞争力。
English
While Transformers have been the main architecture behind deep learning's
success in language modeling, state-space models (SSMs) such as Mamba have
recently been shown to match or outperform Transformers at small to medium
scale. We show that these families of models are actually quite closely
related, and develop a rich framework of theoretical connections between SSMs
and variants of attention, connected through various decompositions of a
well-studied class of structured semiseparable matrices. Our state space
duality (SSD) framework allows us to design a new architecture (Mamba-2) whose
core layer is an a refinement of Mamba's selective SSM that is 2-8X faster,
while continuing to be competitive with Transformers on language modeling.Summary
AI-Generated Summary