ChatPaper.aiChatPaper

Falcon-H1:混合头语言模型家族,重新定义效率与性能

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

July 30, 2025
作者: Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, Mugariya Farooq, Giulia Campesan, Ruxandra Cojocaru, Yasser Djilali, Shi Hu, Iheb Chaabane, Puneesh Khanna, Mohamed El Amine Seddik, Ngoc Dung Huynh, Phuc Le Khac, Leen AlQadi, Billel Mokeddem, Mohamed Chami, Abdalgader Abubaker, Mikhail Lubinets, Kacper Piskorski, Slim Frikha
cs.AI

摘要

在本报告中,我们介绍了Falcon-H1,这是一个新系列的大型语言模型(LLMs),其混合架构设计旨在实现高性能与效率,适用于多种应用场景。与之前仅基于Transformer或Mamba架构的Falcon模型不同,Falcon-H1采用了并行混合方法,将基于Transformer的注意力机制与状态空间模型(SSMs)相结合,后者以卓越的长上下文记忆和计算效率著称。我们系统地重新审视了模型设计、数据策略和训练动态,挑战了该领域的传统做法。Falcon-H1发布了多种配置,包括基础版和指令调优版,参数规模涵盖0.5B、1.5B、1.5B-deep、3B、7B和34B。此外,还提供了量化后的指令调优模型,在Hugging Face Hub上总计超过30个检查点。Falcon-H1模型展现了顶尖的性能以及出色的参数和训练效率。旗舰型号Falcon-H1-34B在参数更少、数据量更小的条件下,与Qwen3-32B、Qwen2.5-72B和Llama3.3-70B等高达70B规模的模型相匹敌甚至超越。较小模型也呈现相似趋势:Falcon-H1-1.5B-Deep与当前领先的7B-10B模型竞争,而Falcon-H1-0.5B则与2024年典型的7B模型表现相当。这些模型在推理、数学、多语言任务、指令遵循及科学知识方面表现卓越。支持高达256K上下文标记和18种语言,Falcon-H1适用于广泛的应用场景。所有模型均以宽松的开源许可证发布,彰显了我们致力于推动可访问且有影响力的AI研究的承诺。
English
In this report, we introduce Falcon-H1, a new series of large language models (LLMs) featuring hybrid architecture designs optimized for both high performance and efficiency across diverse use cases. Unlike earlier Falcon models built solely on Transformer or Mamba architectures, Falcon-H1 adopts a parallel hybrid approach that combines Transformer-based attention with State Space Models (SSMs), known for superior long-context memory and computational efficiency. We systematically revisited model design, data strategy, and training dynamics, challenging conventional practices in the field. Falcon-H1 is released in multiple configurations, including base and instruction-tuned variants at 0.5B, 1.5B, 1.5B-deep, 3B, 7B, and 34B parameters. Quantized instruction-tuned models are also available, totaling over 30 checkpoints on Hugging Face Hub. Falcon-H1 models demonstrate state-of-the-art performance and exceptional parameter and training efficiency. The flagship Falcon-H1-34B matches or outperforms models up to 70B scale, such as Qwen3-32B, Qwen2.5-72B, and Llama3.3-70B, while using fewer parameters and less data. Smaller models show similar trends: the Falcon-H1-1.5B-Deep rivals current leading 7B-10B models, and Falcon-H1-0.5B performs comparably to typical 7B models from 2024. These models excel across reasoning, mathematics, multilingual tasks, instruction following, and scientific knowledge. With support for up to 256K context tokens and 18 languages, Falcon-H1 is suitable for a wide range of applications. All models are released under a permissive open-source license, underscoring our commitment to accessible and impactful AI research.
PDF535July 31, 2025