팔콘 맘바: 경쟁력 있는 주의력 없는 7B 언어 모델

초록

본 기술 보고서에서는 Mamba 아키텍처를 기반으로 한 새로운 대규모 언어 모델인 Falcon Mamba 7B를 소개합니다. Falcon Mamba 7B는 신규 Mamba 아키텍처에 기반을 둔 모델로, 5.8조 토큰에 대해 세심하게 선별된 데이터 혼합물로 훈련되었습니다. 순수 Mamba 기반 모델인 Falcon Mamba 7B는 Transformers를 기반으로 한 선도적인 오픈 가중치 모델인 Mistral 7B, Llama3.1 8B, Falcon2 11B를 능가합니다. Gemma 7B와 동등하며, RecurrentGemma 9B 및 RWKV-v6 Finch 7B/14B와 같은 다른 아키텍처 설계 모델을 능가합니다. 현재 Falcon Mamba 7B는 이 규모에서 문헌에서 최고 성능을 보이는 Mamba 모델로, 기존 Mamba 및 하이브리드 Mamba-Transformer 모델을 모두 능가하는 Open LLM Leaderboard에 따르면 최고입니다. Falcon Mamba 7B는 아키텍처로 인해 추론 속도가 현저히 빠르며, 장거리 시퀀스 생성에 대해 상당히 적은 메모리가 필요합니다. 최근 연구에서는 하이브리드 Mamba-Transformer 모델이 순수 아키텍처 설계를 능가한다는 제언이 있었지만, 우리는 순수 Mamba 설계조차 Transformer 및 하이브리드 설계와 유사하거나 우수한 결과를 달성할 수 있다는 것을 입증합니다. Falcon Mamba 7B의 구현 가중치는 허용 라이선스 하에 https://huggingface.co/tiiuae/falcon-mamba-7b에서 공개되어 있습니다.

English

In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models, according to the Open LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we demonstrate that even the pure Mamba design can achieve similar, or even superior results compared to the Transformer and hybrid designs. We make the weights of our implementation of Falcon Mamba 7B publicly available on https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.

팔콘 맘바: 경쟁력 있는 주의력 없는 7B 언어 모델

Falcon Mamba: The First Competitive Attention-free 7B Language Model

초록

Support