De illusie van specialisatie: onthulling van het domein-invariante "vast comité" in Mixture-of-Experts-modellen

Samenvatting

Men neemt algemeen aan dat Mixture of Experts-modellen domeinspecialisatie bereiken via sparse routing. In dit werk betwijfelen we deze aanname door COMMITTEEAUDIT te introduceren, een *post hoc* raamwerk dat routeringsgedrag analyseert op het niveau van expertgroepen in plaats van individuele experts. Over drie representatieve modellen en de MMLU-benchmark heen, ontdekken we een domein-invariant *Standing Committee*. Dit is een compacte coalitie van gerouteerde experts die consequent de meerderheid van de routeringsmassa vastlegt across domeinen, lagen en routeringsbudgetten, zelfs wanneer architecturen reeds gedeelde experts bevatten. Kwalitatieve analyse toont verder aan dat *Standing Committees* de redeneerstructuur en syntaxis verankeren, terwijl perifere experts domeinspecifieke kennis afhandelen. Deze bevindingen onthullen een sterke structurele bias richting gecentraliseerde berekening, wat suggereert dat specialisatie in Mixture of Experts-modellen veel minder wijdverbreid is dan algemeen wordt aangenomen. Deze inherente bias geeft ook aan dat huidige trainingsdoelstellingen, zoals *load-balancing losses* die uniforme expertutilisatie afdwingen, mogelijk ingaan tegen het natuurlijke optimalisatiepad van het model, waardoor de trainings efficiëntie en prestaties worden beperkt.

English

Mixture of Experts models are widely assumed to achieve domain specialization through sparse routing. In this work, we question this assumption by introducing COMMITTEEAUDIT, a post hoc framework that analyzes routing behavior at the level of expert groups rather than individual experts. Across three representative models and the MMLU benchmark, we uncover a domain-invariant Standing Committee. This is a compact coalition of routed experts that consistently captures the majority of routing mass across domains, layers, and routing budgets, even when architectures already include shared experts. Qualitative analysis further shows that Standing Committees anchor reasoning structure and syntax, while peripheral experts handle domain-specific knowledge. These findings reveal a strong structural bias toward centralized computation, suggesting that specialization in Mixture of Experts models is far less pervasive than commonly believed. This inherent bias also indicates that current training objectives, such as load-balancing losses that enforce uniform expert utilization, may be working against the model's natural optimization path, thereby limiting training efficiency and performance.

De illusie van specialisatie: onthulling van het domein-invariante "vast comité" in Mixture-of-Experts-modellen

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Samenvatting

Support