FineRMoE：より細粒度な専門家のための次元拡張とそのアップサイクル手法

要旨

細粒度MoEのスケーリング則が示すように、中間次元の粒度が最適閾値を超えるとモデル性能の向上が止まり、単一次元における細粒度設計の限界が生じる。このボトルネックを解決するため、我々はFineRMoE（FineR-Grained MoE）を提案する。このアーキテクチャは細粒度エキスパート設計を中間次元と出力次元の両方に拡張し、単一次元の限界を超えたエキスパート専門性の向上を目指す。さらに、二段階疎フォワード計算パラダイムと専用ルーティング機構を導入し、活性化を制御する。加えて、FineRMoEのスクラッチ学習に伴う莫大なコストを回避するため、コスト効率的にFineRMoEを構築する一般化アップサイクル手法を考案した。大規模実験により、FineRMoEが10の標準ベンチマークで優れた性能を達成することを実証した。最強のベースラインと比較して、FineRMoEは推論時に6倍のパラメータ効率、281倍の低いプリフィルレイテンシ、136倍の高いデコードスループットを実現した。

English

As revealed by the scaling law of fine-grained MoE, model performance ceases to be improved once the granularity of the intermediate dimension exceeds the optimal threshold, limiting further gains from single-dimension fine-grained design. To address this bottleneck, we propose FineRMoE (FineR-Grained MoE), an architecture that extends fine-grained expert design to both intermediate and output dimensions, aiming to enhance expert specialization beyond the single-dimension limit. We further introduce a bi-level sparse forward computation paradigm and a specialized routing mechanism to govern the activation. In addition, to obviate the prohibitive cost of training FineRMoE from scratch, we devise a generalized upcycling method to build FineRMoE in a cost-effective manner. Extensive experiments demonstrate the superior performance achieved by FineRMoE across ten standard benchmarks. Compared with the strongest baseline, FineRMoE achieves 6 times higher parameter efficiency, 281 times lower prefill latency, and 136 timese higher decoding throughput during inference.

FineRMoE：より細粒度な専門家のための次元拡張とそのアップサイクル手法

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

要旨

Support