ChatPaper.aiChatPaper

FlexMoRE:一种灵活混合异构排名专家的高效联邦训练大语言模型架构

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

February 9, 2026
作者: Annemette Brok Pirchert, Jacob Nielsen, Mogens Henrik From, Lukas Galke Poech, Peter Schneider-Kamp
cs.AI

摘要

近期專家混合架構的研究進展表明,可通過聯邦式訓練方法使各專家模型獨立於其他專家進行訓練,並利用共享基礎模型實現協調。然而我們提出假設:並非所有領域都需要完整規模的專家模型,低秩適配器可能已足夠勝任。本文提出FlexMoRE(靈活秩異構專家混合架構),其專家組件可為完整規模專家或適當秩值的適配器。我們通過評估6種秩值從2^0到2^{14}的專家模型,系統性研究專家秩值與下游任務性能的平衡關係,共完成150組混合實驗(含96組雙專家混合與54組七專家混合),並在120項任務中進行驗證。實驗基於FlexOlmo架構,將其預訓練專家轉化為低秩版本。專家秩值與下游任務性能的回歸分析顯示:推理密集型基準任務的最佳秩值顯著高於知識密集型基準任務。這種秩敏感性發現直接影響記憶體效率:採用最優秩值時,FlexMoRE在參數量不足基線模型三分之一(10.75B對比33.27B)的情況下,下游任務平均性能(47.18分)優於採用完整專家的FlexOlmo基線模型(45.46分)。所有代碼將開源發布。
English
Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating 6 experts with ranks 2^0 to 2^{14} resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across 120 tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression analysis from expert rank to downstream task performance reveals that the best-performing rank is substantially higher for reasoning-heavy benchmarks than for knowledge-heavy benchmarks. These findings on rank sensitivity come with direct implications for memory efficiency: Using optimal ranks, FlexMoRE yields improved downstream task performance (average score 47.18) compared to the baseline FlexOlmo-style mixture of full-sized experts (average score 45.46) at less than one third the parameters (10.75B for FlexMoRE vs. 33.27B for FlexOlmo). All code will be made available.
PDF12February 11, 2026