Falcon-H1R:融合混合模型推进推理前沿,实现高效测试时扩展
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
January 5, 2026
作者: Falcon LLM Team, Iheb Chaabane, Puneesh Khanna, Suhail Mohmad, Slim Frikha, Shi Hu, Abdalgader Abubaker, Reda Alami, Mikhail Lubinets, Mohamed El Amine Seddik, Hakim Hacid
cs.AI
摘要
本研究推出Falcon-H1R——一个70亿参数规模的推理优化模型,证明了小型语言模型(SLM)同样能实现具有竞争力的推理性能。该模型以参数效率著称,在多项推理密集型基准测试中,持续达到或超越参数量为其2至7倍的最先进推理模型的表现。这些成果凸显了精细数据筛选与针对性训练策略(通过高效监督微调与强化学习扩展相结合)的重要性,表明无需增加模型规模即可实现显著性能提升。此外,Falcon-H1R通过融合更快推理速度(基于混合并行架构设计)、更高令牌效率及更优精度,将推理效率的立体边界推向新高度。这种独特组合使Falcon-H1R-7B成为扩展高级推理系统的实用基础架构,尤其适用于需要大量思维链生成与并行测试时扩展的场景。借助最新提出的DeepConf方法,该模型实现了最先进的测试时扩展效率,在精度与计算成本方面均取得显著改进。Falcon-H1R由此证明:通过针对性模型训练与架构优化,紧凑模型同样能提供强大且可扩展的推理性能。
English
This work introduces Falcon-H1R, a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs). Falcon-H1R stands out for its parameter efficiency, consistently matching or outperforming SOTA reasoning models that are 2times to 7times larger across a variety of reasoning-intensive benchmarks. These results underscore the importance of careful data curation and targeted training strategies (via both efficient SFT and RL scaling) in delivering significant performance gains without increasing model size. Furthermore, Falcon-H1R advances the 3D limits of reasoning efficiency by combining faster inference (through its hybrid-parallel architecture design), token efficiency, and higher accuracy. This unique blend makes Falcon-H1R-7B a practical backbone for scaling advanced reasoning systems, particularly in scenarios requiring extensive chain-of-thoughts generation and parallel test-time scaling. Leveraging the recently introduced DeepConf approach, Falcon-H1R achieves state-of-the-art test-time scaling efficiency, offering substantial improvements in both accuracy and computational cost. As a result, Falcon-H1R demonstrates that compact models, through targeted model training and architectural choices, can deliver robust and scalable reasoning performance.