ChatPaper.aiChatPaper

FlexMoRE:一种面向高效联邦训练大语言模型的可扩展混合异构专家架构

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

February 9, 2026
作者: Annemette Brok Pirchert, Jacob Nielsen, Mogens Henrik From, Lukas Galke Poech, Peter Schneider-Kamp
cs.AI

摘要

近期专家混合架构的研究进展表明,可通过共享基础模型实现专家模型的联邦式独立训练。但我们提出假设:并非所有领域都需要全参数专家模型,低秩适配器可能已足够胜任。本文提出FlexMoRE——一种支持秩异构专家的灵活混合架构,其专家组件既可以是全参数模型,也可以是适当秩的适配器。我们通过评估6种不同秩(2^0至2^14)的专家模型,系统性地研究了专家秩与下游任务性能的权衡关系,共完成150组混合实验(含96组双专家混合、54组七专家混合),覆盖120项任务。实验基于FlexOlmo框架,将其预训练专家转换为低秩版本。专家秩与任务性能的回归分析表明:推理密集型任务的最佳秩显著高于知识密集型任务。这种秩敏感性发现直接关联内存效率——采用最优秩配置的FlexMoRE在参数量不足基线三分之一(107.5亿参数 vs 332.7亿参数)的情况下,下游任务平均得分(47.18)超越全参数专家混合的FlexOlmo基线(45.46)。所有代码将公开提供。
English
Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating 6 experts with ranks 2^0 to 2^{14} resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across 120 tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression analysis from expert rank to downstream task performance reveals that the best-performing rank is substantially higher for reasoning-heavy benchmarks than for knowledge-heavy benchmarks. These findings on rank sensitivity come with direct implications for memory efficiency: Using optimal ranks, FlexMoRE yields improved downstream task performance (average score 47.18) compared to the baseline FlexOlmo-style mixture of full-sized experts (average score 45.46) at less than one third the parameters (10.75B for FlexMoRE vs. 33.27B for FlexOlmo). All code will be made available.
PDF12February 11, 2026