Souper-Model:簡單算術如何釋放頂尖大型語言模型的潛力
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
November 17, 2025
作者: Shalini Maiti, Amar Budhiraja, Bhavul Gauri, Gaurav Chaurasia, Anton Protopopov, Alexis Audran-Reiss, Michael Slater, Despoina Magka, Tatiana Shavrina, Roberta Raileanu, Yoram Bachrach
cs.AI
摘要
大型語言模型(LLMs)在各領域展現出卓越能力,但其訓練過程仍耗費大量資源與時間,需要龐大算力及精密的訓練流程調度。模型融合技術——即對相同架構的多個模型權重進行平均——已成為一種極具潛力的訓練前後優化手段,能在免去高昂重訓練成本的前提下提升模型表現。本文提出類別專家融合法(SoCE),這是一種基於基準測試組合的模型融合框架,通過系統化篩選最優模型候選集並應用非均勻加權平均來最大化性能。有別於傳統均權平均方法,我們的方法基於一項關鍵觀察:不同基準測試類別間的模型表現往往呈現低相關性。SoCE針對每個弱相關類別集群識別出「專家」模型,並採用優化加權而非均勻權重進行融合。實驗證明,該方法在多語言能力、工具調用及數學推理等多個領域均能提升模型性能與魯棒性,並在伯克利函數調用排行榜上取得最先進的成果。
English
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and careful orchestration of training procedures. Model souping-the practice of averaging weights from multiple models of the same architecture-has emerged as a promising pre- and post-training technique that can enhance performance without expensive retraining. In this paper, we introduce Soup Of Category Experts (SoCE), a principled approach for model souping that utilizes benchmark composition to identify optimal model candidates and applies non-uniform weighted averaging to maximize performance. Contrary to previous uniform-averaging approaches, our method leverages the observation that benchmark categories often exhibit low inter-correlations in model performance. SoCE identifies "expert" models for each weakly-correlated category cluster and combines them using optimized weighted averaging rather than uniform weights. We demonstrate that the proposed method improves performance and robustness across multiple domains, including multilingual capabilities, tool calling, and math and achieves state-of-the-art results on the Berkeley Function Calling Leaderboard.