專家競賽：一種靈活的路由策略，用於擴展基於專家混合的擴散變換器

摘要

擴散模型已成為視覺生成領域的主流框架。在此基礎上，混合專家（MoE）方法的整合顯示出提升模型可擴展性和性能的潛力。本文提出Race-DiT，一種新穎的MoE模型，用於擴散變壓器，並配備了靈活的路由策略——專家競賽。通過讓標記和專家共同競爭並選出最佳候選者，該模型學會動態地將專家分配給關鍵標記。此外，我們提出逐層正則化以解決淺層學習中的挑戰，並引入路由器相似性損失以防止模式崩潰，從而確保更好的專家利用率。在ImageNet上的大量實驗驗證了我們方法的有效性，展示了顯著的性能提升，同時具有良好的擴展特性。

English

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

專家競賽：一種靈活的路由策略，用於擴展基於專家混合的擴散變換器

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

摘要

Support