ChatPaper.aiChatPaper

Falcon Mamba:首個競爭力強且無注意力機制的 7B 語言模型

Falcon Mamba: The First Competitive Attention-free 7B Language Model

October 7, 2024
作者: Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid
cs.AI

摘要

在這份技術報告中,我們介紹了Falcon Mamba 7B,這是一個基於新穎Mamba架構的大型語言模型。Falcon Mamba 7B是在精心挑選的數據混合物上訓練的,總共包含了5.8萬億個標記。作為純Mamba架構的模型,Falcon Mamba 7B超越了基於Transformer的領先開放權重模型,如Mistral 7B、Llama3.1 8B和Falcon2 11B。它與Gemma 7B齊名,並且優於具有不同架構設計的模型,如RecurrentGemma 9B和RWKV-v6 Finch 7B/14B。目前,根據Open LLM Leaderboard,Falcon Mamba 7B是文獻中在這個規模上表現最佳的Mamba模型,超越了現有的Mamba和混合Mamba-Transformer模型。由於其架構,Falcon Mamba 7B在推理速度上顯著更快,並且在生成長序列時需要更少的記憶體。儘管最近的研究表明混合Mamba-Transformer模型優於純架構設計,我們證明即使是純Mamba設計也能達到與Transformer和混合設計相似甚至更優秀的結果。我們將Falcon Mamba 7B的權重以寬鬆的許可證公開在https://huggingface.co/tiiuae/falcon-mamba-7b。
English
In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models, according to the Open LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we demonstrate that even the pure Mamba design can achieve similar, or even superior results compared to the Transformer and hybrid designs. We make the weights of our implementation of Falcon Mamba 7B publicly available on https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.

Summary

AI-Generated Summary

PDF362November 16, 2024