ChatPaper.aiChatPaper

专家竞赛:一种灵活的路由策略,用于扩展基于混合专家的扩散Transformer模型

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

March 20, 2025
作者: Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min
cs.AI

摘要

扩散模型已成为视觉生成领域的主流框架。基于这一成功,混合专家(MoE)方法的整合在提升模型可扩展性和性能方面展现出潜力。本文提出Race-DiT,一种新颖的MoE模型,专为扩散变换器设计,采用灵活的专家竞赛路由策略。通过让令牌与专家共同竞争并筛选出最优候选,模型能够动态地将专家分配给关键令牌。此外,我们提出了逐层正则化以解决浅层学习中的挑战,以及路由器相似性损失来防止模式崩溃,确保专家得到更有效的利用。在ImageNet上的大量实验验证了我们方法的有效性,展示了显著的性能提升,同时具备良好的扩展性。
English
Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

Summary

AI-Generated Summary

PDF142March 21, 2025