エキスパートレース：Mixture of Expertsを用いたDiffusion Transformerのスケーリングのための柔軟なルーティング戦略

要旨

拡散モデルは視覚生成における主流のフレームワークとして登場しました。この成功を基盤として、Mixture of Experts（MoE）手法の統合は、モデルのスケーラビリティと性能の向上に有望な結果を示しています。本論文では、柔軟なルーティング戦略「Expert Race」を備えた拡散トランスフォーマーのための新規MoEモデル、Race-DiTを紹介します。トークンとエキスパートが競争し、上位候補を選択することを可能にすることで、モデルは重要なトークンにエキスパートを動的に割り当てることを学習します。さらに、浅い層の学習における課題に対処するための層ごとの正則化と、モード崩壊を防ぐためのルーター類似性損失を提案し、エキスパートのより良い活用を確保します。ImageNetでの大規模な実験により、本手法の有効性が検証され、スケーリング特性を保証しつつ、大幅な性能向上が示されました。

English

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

エキスパートレース：Mixture of Expertsを用いたDiffusion Transformerのスケーリングのための柔軟なルーティング戦略

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

要旨

Support