Falcon-H1:重新定義效率與性能的混合頭部語言模型家族
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
July 30, 2025
作者: Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, Mugariya Farooq, Giulia Campesan, Ruxandra Cojocaru, Yasser Djilali, Shi Hu, Iheb Chaabane, Puneesh Khanna, Mohamed El Amine Seddik, Ngoc Dung Huynh, Phuc Le Khac, Leen AlQadi, Billel Mokeddem, Mohamed Chami, Abdalgader Abubaker, Mikhail Lubinets, Kacper Piskorski, Slim Frikha
cs.AI
摘要
在本報告中,我們介紹了Falcon-H1,這是一個新系列的大型語言模型(LLMs),其混合架構設計針對多種使用場景進行了高性能和效率的優化。與早期僅基於Transformer或Mamba架構的Falcon模型不同,Falcon-H1採用了並行混合方法,結合了基於Transformer的注意力機制與狀態空間模型(SSMs),後者以卓越的長上下文記憶和計算效率著稱。我們系統性地重新審視了模型設計、數據策略和訓練動態,挑戰了該領域的傳統做法。Falcon-H1以多種配置發布,包括基礎版和指令微調版,參數規模涵蓋0.5B、1.5B、1.5B-deep、3B、7B和34B。量化版的指令微調模型也已提供,總計在Hugging Face Hub上發布了超過30個檢查點。Falcon-H1模型展示了最先進的性能以及卓越的參數和訓練效率。旗艦模型Falcon-H1-34B在性能上匹配或超越了高達70B規模的模型,如Qwen3-32B、Qwen2.5-72B和Llama3.3-70B,同時使用了更少的參數和數據。較小的模型也顯示出類似的趨勢:Falcon-H1-1.5B-Deep與當前領先的7B-10B模型相媲美,而Falcon-H1-0.5B的表現與2024年典型的7B模型相當。這些模型在推理、數學、多語言任務、指令遵循和科學知識方面表現出色。Falcon-H1支持高達256K的上下文標記和18種語言,適用於廣泛的應用場景。所有模型均在寬鬆的開源許可下發布,彰顯了我們對可訪問且具影響力的AI研究的承諾。
English
In this report, we introduce Falcon-H1, a new series of large language models
(LLMs) featuring hybrid architecture designs optimized for both high
performance and efficiency across diverse use cases. Unlike earlier Falcon
models built solely on Transformer or Mamba architectures, Falcon-H1 adopts a
parallel hybrid approach that combines Transformer-based attention with State
Space Models (SSMs), known for superior long-context memory and computational
efficiency. We systematically revisited model design, data strategy, and
training dynamics, challenging conventional practices in the field. Falcon-H1
is released in multiple configurations, including base and instruction-tuned
variants at 0.5B, 1.5B, 1.5B-deep, 3B, 7B, and 34B parameters. Quantized
instruction-tuned models are also available, totaling over 30 checkpoints on
Hugging Face Hub. Falcon-H1 models demonstrate state-of-the-art performance and
exceptional parameter and training efficiency. The flagship Falcon-H1-34B
matches or outperforms models up to 70B scale, such as Qwen3-32B, Qwen2.5-72B,
and Llama3.3-70B, while using fewer parameters and less data. Smaller models
show similar trends: the Falcon-H1-1.5B-Deep rivals current leading 7B-10B
models, and Falcon-H1-0.5B performs comparably to typical 7B models from 2024.
These models excel across reasoning, mathematics, multilingual tasks,
instruction following, and scientific knowledge. With support for up to 256K
context tokens and 18 languages, Falcon-H1 is suitable for a wide range of
applications. All models are released under a permissive open-source license,
underscoring our commitment to accessible and impactful AI research.