HodgeCover: 고차 위상적 커버리지가 주도하는 희소 혼합 전문가 압축

초록

희소 전문가 혼합(Sparse Mixture-of-Experts, MoE) 계층은 토큰을 소수의 전문가를 통해 라우팅하며, 이러한 계층에 대한 학습 없는 압축은 재학습 없이 추론 비용을 줄인다. 이 계열의 모든 기존 압축기를 막는 미묘한 장애물이 존재한다. 세 전문가가 각각 쌍별로 호환 가능하더라도 함께 병합될 때 환원 불가능한 순환 구조를 형성할 수 있기 때문에, 쌍별 신호에 기반하여 전문가를 순위화하는 모든 점수는 어떤 삼중항이 공동으로 병합 가능한지에 대해 구조적으로 파악하지 못한다. 본 연구는 이 장애물이 정확한 수학적 객체, 즉 정점이 전문가이고 에지가 KL 병합 장벽을, 면이 삼중항 장벽을 갖는 2-복합체 상의 단순체 라플라시안(simplicial Laplacian)의 조화 커널(harmonic kernel)임을 보인다. 에지 장벽 신호에 호지 분해(Hodge decomposition)를 적용하면 이 커널을 정확히 분리할 수 있다. 우리는 이 진단법을 선택 목표로 전환한다. HodgeCover는 조화-임계 에지와 삼중항-임계 삼각형을 탐욕적으로 커버하며, 이에 대한 하이브리드 변형은 잔존 전문가에 대한 기성 가중치 가지치기와 결합된다. 공격적인 전문가 축소 환경에서 세 가지 공개 가중치 Sparse MoE 백본을 대상으로 한 실험에서, HodgeCover는 전문가 축소 측면에서 최신 학습 없는 기준선과 동등한 성능을 보이고, 하이브리드 축의 공격적 압축 최전선에서 선도하며, 네 가지 호지 성분 전체에 걸쳐 유지되는 질량을 고유하게 균형 잡는다. 이러한 결과는 학습된 MoE 구조의 조화 커널을 드러내는 것이 가장 중요한 영역에서 어떤 압축기가 승리하는지를 변화시킴을 보여준다.

English

Sparse Mixture-of-Experts (MoE) layers route tokens through a handful of experts, and learning-free compression of these layers reduces inference cost without retraining. A subtle obstruction blocks every existing compressor in this family: three experts can each be pairwise compatible yet form an irreducible cycle when merged together, so any score that ranks experts on pairwise signals is structurally blind to which triples are jointly mergeable. We show the obstruction is a precise mathematical object, the harmonic kernel of the simplicial Laplacian on a 2-complex whose vertices are experts, whose edges carry KL merge barriers, and whose faces carry triplet barriers; Hodge-decomposing the edge-barrier signal isolates the kernel exactly. We turn the diagnostic into a selection objective: HodgeCover greedily covers the harmonic-critical edges and triplet-critical triangles, and a hybrid variant of HodgeCover pairs it with off-the-shelf weight pruning on survivors. On three open-weight Sparse MoE backbones under aggressive expert reduction, HodgeCover matches state-of-the-art learning-free baselines on the expert-reduction axis, leads on the aggressive-compression frontier of the hybrid axis, and uniquely balances retained mass across all four Hodge components. These results show that exposing the harmonic kernel of a learned MoE structure changes which compressor wins at the regime that matters most.