ファルコン・マンバ：初の競争力のある注意不要な7B言語モデル

要旨

この技術レポートでは、新しいMambaアーキテクチャに基づく大規模言語モデルであるFalcon Mamba 7Bを紹介します。Falcon Mamba 7Bは、慎重に選択されたデータ混合を用いて5.8兆トークンでトレーニングされています。純粋なMambaベースのモデルであるFalcon Mamba 7Bは、Transformersに基づく主要なオープンウェイトモデルであるMistral 7B、Llama3.1 8B、およびFalcon2 11Bを凌駕しています。Gemma 7Bと同等であり、RecurrentGemma 9BやRWKV-v6 Finch 7B/14Bなどの異なるアーキテクチャ設計のモデルを上回っています。現在、Falcon Mamba 7Bは、このスケールで文献中で最も性能の高いMambaモデルであり、Open LLM Leaderboardによると、既存のMambaモデルやハイブリッドMamba-Transformerモデルを凌駕しています。アーキテクチャにより、Falcon Mamba 7Bは推論時に著しく高速であり、長いシーケンス生成には大幅に少ないメモリが必要です。最近の研究では、ハイブリッドMamba-Transformerモデルが純粋なアーキテクチャ設計を上回ると示唆していますが、私たちは純粋なMamba設計でもTransformerやハイブリッド設計と同等またはそれ以上の結果を達成できることを示しています。Falcon Mamba 7Bの実装の重みは、許諾されたライセンスのもとでhttps://huggingface.co/tiiuae/falcon-mamba-7b で公開されています。

English

In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models, according to the Open LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we demonstrate that even the pure Mamba design can achieve similar, or even superior results compared to the Transformer and hybrid designs. We make the weights of our implementation of Falcon Mamba 7B publicly available on https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.

ファルコン・マンバ：初の競争力のある注意不要な7B言語モデル

Falcon Mamba: The First Competitive Attention-free 7B Language Model

要旨

Support