Falcon-H1: 효율성과 성능을 재정의하는 하이브리드 헤드 언어 모델 패밀리

초록

본 보고서에서는 다양한 사용 사례에서 높은 성능과 효율성을 위해 최적화된 하이브리드 아키텍처 설계를 특징으로 하는 새로운 대규모 언어 모델(LLM) 시리즈인 Falcon-H1을 소개한다. 이전의 Transformer 또는 Mamba 아키텍처만으로 구축된 Falcon 모델과 달리, Falcon-H1은 Transformer 기반의 어텐션 메커니즘과 장기 컨텍스트 메모리 및 계산 효율성에서 우수한 성능을 보이는 상태 공간 모델(SSM)을 병렬적으로 결합한 하이브리드 접근 방식을 채택하였다. 모델 설계, 데이터 전략, 훈련 역학을 체계적으로 재검토하며, 해당 분야의 기존 관행에 도전하였다. Falcon-H1은 0.5B, 1.5B, 1.5B-deep, 3B, 7B, 34B 파라미터 규모의 기본 및 지시 튜닝 변형을 포함한 다양한 구성으로 출시되었다. 양자화된 지시 튜닝 모델도 제공되며, Hugging Face Hub에서 총 30개 이상의 체크포인트를 확인할 수 있다. Falcon-H1 모델은 최첨단 성능과 탁월한 파라미터 및 훈련 효율성을 보여준다. 플래그십 모델인 Falcon-H1-34B는 Qwen3-32B, Qwen2.5-72B, Llama3.3-70B와 같은 70B 규모의 모델을 적은 파라미터와 데이터로 사용하면서도 동등하거나 더 나은 성능을 달성한다. 더 작은 모델들도 유사한 경향을 보인다: Falcon-H1-1.5B-Deep은 현재 선두를 달리는 7B-10B 모델들과 경쟁력이 있으며, Falcon-H1-0.5B는 2024년의 일반적인 7B 모델과 비슷한 성능을 보인다. 이러한 모델들은 추론, 수학, 다국어 작업, 지시 수행, 과학 지식 등 다양한 분야에서 뛰어난 성능을 발휘한다. 최대 256K 컨텍스트 토큰과 18개 언어를 지원하는 Falcon-H1은 광범위한 응용 분야에 적합하다. 모든 모델은 허용적 오픈소스 라이선스로 출시되어, 접근 가능하고 영향력 있는 AI 연구에 대한 우리의 약속을 강조한다.

English

In this report, we introduce Falcon-H1, a new series of large language models (LLMs) featuring hybrid architecture designs optimized for both high performance and efficiency across diverse use cases. Unlike earlier Falcon models built solely on Transformer or Mamba architectures, Falcon-H1 adopts a parallel hybrid approach that combines Transformer-based attention with State Space Models (SSMs), known for superior long-context memory and computational efficiency. We systematically revisited model design, data strategy, and training dynamics, challenging conventional practices in the field. Falcon-H1 is released in multiple configurations, including base and instruction-tuned variants at 0.5B, 1.5B, 1.5B-deep, 3B, 7B, and 34B parameters. Quantized instruction-tuned models are also available, totaling over 30 checkpoints on Hugging Face Hub. Falcon-H1 models demonstrate state-of-the-art performance and exceptional parameter and training efficiency. The flagship Falcon-H1-34B matches or outperforms models up to 70B scale, such as Qwen3-32B, Qwen2.5-72B, and Llama3.3-70B, while using fewer parameters and less data. Smaller models show similar trends: the Falcon-H1-1.5B-Deep rivals current leading 7B-10B models, and Falcon-H1-0.5B performs comparably to typical 7B models from 2024. These models excel across reasoning, mathematics, multilingual tasks, instruction following, and scientific knowledge. With support for up to 256K context tokens and 18 languages, Falcon-H1 is suitable for a wide range of applications. All models are released under a permissive open-source license, underscoring our commitment to accessible and impactful AI research.

Falcon-H1: 효율성과 성능을 재정의하는 하이브리드 헤드 언어 모델 패밀리

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

초록

Support