猎鹰曼巴:首个竞争性无注意力的7B语言模型
Falcon Mamba: The First Competitive Attention-free 7B Language Model
October 7, 2024
作者: Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid
cs.AI
摘要
在这份技术报告中,我们介绍了Falcon Mamba 7B,这是基于新颖的Mamba架构的一种新型大型语言模型。Falcon Mamba 7B是在精心挑选的数据混合物上训练的,共有5800亿个标记。作为一种纯Mamba架构的模型,Falcon Mamba 7B超越了基于Transformer的领先开放权重模型,如Mistral 7B、Llama3.1 8B和Falcon2 11B。它与Gemma 7B不相上下,并且胜过了具有不同架构设计的模型,比如RecurrentGemma 9B和RWKV-v6 Finch 7B/14B。目前,根据Open LLM Leaderboard,Falcon Mamba 7B是文献中在这一规模上表现最佳的Mamba模型,超越了现有的Mamba和混合Mamba-Transformer模型。由于其架构,Falcon Mamba 7B在推理速度上显著更快,并且在长序列生成时需要的内存大大减少。尽管最近的研究表明混合Mamba-Transformer模型胜过纯架构设计,我们证明即使是纯Mamba设计也能够实现类似甚至更优越的结果,相较于Transformer和混合设计。我们公开了Falcon Mamba 7B实现的权重,可在https://huggingface.co/tiiuae/falcon-mamba-7b 上以宽松许可证获取。
English
In this technical report, we present Falcon Mamba 7B, a new base large
language model based on the novel Mamba architecture. Falcon Mamba 7B is
trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure
Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based
on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par
with Gemma 7B and outperforms models with different architecture designs, such
as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is
the best-performing Mamba model in the literature at this scale, surpassing
both existing Mamba and hybrid Mamba-Transformer models, according to the Open
LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly
faster at inference and requires substantially less memory for long sequence
generation. Despite recent studies suggesting that hybrid Mamba-Transformer
models outperform pure architecture designs, we demonstrate that even the pure
Mamba design can achieve similar, or even superior results compared to the
Transformer and hybrid designs. We make the weights of our implementation of
Falcon Mamba 7B publicly available on
https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.Summary
AI-Generated Summary