HodgeCover：高阶拓扑覆盖驱动稀疏混合专家压缩

摘要

稀疏混合专家（MoE）层将令牌路由至少数专家，对此类层进行免学习压缩可在无需重新训练的情况下降低推理成本。然而，现有该系列压缩方法均面临一个隐蔽的阻碍：三个专家可能两两兼容，但合并时却形成不可约环，因此任何基于成对信号对专家进行排序的评分机制，在结构上都无法识别哪些三元组可联合合并。我们证明该阻碍是一个精确的数学对象——定义在二维复形上的单纯拉普拉斯算子的调和核：该复形的顶点对应专家，边携带KL散度合并障碍，面携带三元组障碍；对边障碍信号进行霍奇分解可精确分离此核。我们将该诊断转化为选择目标：霍奇覆盖（HodgeCover）算法贪心地覆盖调和关键边与关键三角形，其混合变体进一步将霍奇覆盖与现成的幸存者权重剪枝相结合。在三个开放权重的稀疏MoE主模型上进行激进专家缩减实验时，霍奇覆盖在专家缩减维度上达到与最先进免学习基线相当的性能，在混合维度的激进压缩前沿上领先，并独特地在所有四个霍奇分量上实现保留质量的平衡。这些结果表明，揭示学习型MoE结构的调和核，能够改变在最关键场景中表现最佳的压缩方法。

English

Sparse Mixture-of-Experts (MoE) layers route tokens through a handful of experts, and learning-free compression of these layers reduces inference cost without retraining. A subtle obstruction blocks every existing compressor in this family: three experts can each be pairwise compatible yet form an irreducible cycle when merged together, so any score that ranks experts on pairwise signals is structurally blind to which triples are jointly mergeable. We show the obstruction is a precise mathematical object, the harmonic kernel of the simplicial Laplacian on a 2-complex whose vertices are experts, whose edges carry KL merge barriers, and whose faces carry triplet barriers; Hodge-decomposing the edge-barrier signal isolates the kernel exactly. We turn the diagnostic into a selection objective: HodgeCover greedily covers the harmonic-critical edges and triplet-critical triangles, and a hybrid variant of HodgeCover pairs it with off-the-shelf weight pruning on survivors. On three open-weight Sparse MoE backbones under aggressive expert reduction, HodgeCover matches state-of-the-art learning-free baselines on the expert-reduction axis, leads on the aggressive-compression frontier of the hybrid axis, and uniquely balances retained mass across all four Hodge components. These results show that exposing the harmonic kernel of a learned MoE structure changes which compressor wins at the regime that matters most.