Falcon-H1R:以混合模型推進推理前沿,實現高效測試時擴展
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
January 5, 2026
作者: Falcon LLM Team, Iheb Chaabane, Puneesh Khanna, Suhail Mohmad, Slim Frikha, Shi Hu, Abdalgader Abubaker, Reda Alami, Mikhail Lubinets, Mohamed El Amine Seddik, Hakim Hacid
cs.AI
摘要
本研究推出Falcon-H1R,這是一個70億參數的推理優化模型,證明了小型語言模型(SLM)同樣能實現具競爭力推理性能的可行性。Falcon-H1R的突出特點在於其參數效率——在各種推理密集型基準測試中,持續達到或超越參數量為其2至7倍的最先進推理模型性能。這些成果凸顯了精細數據策劃與定向訓練策略(通過高效監督微調與強化學習規模化)的重要性,能在不增加模型規模的前提下實現顯著性能提升。更進一步,Falcon-H1R透過結合更快的推理速度(藉由其混合並行架構設計)、令牌效率與更高準確率,推進了推理效率的三維極限。這種獨特組合使Falcon-H1R-7B成為擴展先進推理系統的實用骨幹,特別適用於需要大量思維鏈生成與並行測試時規模化的場景。憑藉最新引入的DeepConf方法,Falcon-H1R實現了最先進的測試時規模化效率,在準確率與計算成本方面均帶來顯著改善。由此證明,緊湊模型透過定向模型訓練與架構選擇,能夠提供強健且可擴展的推理性能。
English
This work introduces Falcon-H1R, a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs). Falcon-H1R stands out for its parameter efficiency, consistently matching or outperforming SOTA reasoning models that are 2times to 7times larger across a variety of reasoning-intensive benchmarks. These results underscore the importance of careful data curation and targeted training strategies (via both efficient SFT and RL scaling) in delivering significant performance gains without increasing model size. Furthermore, Falcon-H1R advances the 3D limits of reasoning efficiency by combining faster inference (through its hybrid-parallel architecture design), token efficiency, and higher accuracy. This unique blend makes Falcon-H1R-7B a practical backbone for scaling advanced reasoning systems, particularly in scenarios requiring extensive chain-of-thoughts generation and parallel test-time scaling. Leveraging the recently introduced DeepConf approach, Falcon-H1R achieves state-of-the-art test-time scaling efficiency, offering substantial improvements in both accuracy and computational cost. As a result, Falcon-H1R demonstrates that compact models, through targeted model training and architectural choices, can deliver robust and scalable reasoning performance.