DiffMoE:面向可扩展扩散变换器的动态令牌选择机制
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
March 18, 2025
作者: Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie Zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun Gai
cs.AI
摘要
扩散模型在多种图像生成任务中展现了卓越的成就,但其性能往往受限于对不同条件和噪声水平下输入的统一处理。为解决这一局限,我们提出了一种新颖方法,该方法充分利用了扩散过程固有的异质性。我们的方法——DiffMoE,引入了一个批处理级别的全局令牌池,使得专家在训练期间能够访问全局令牌分布,从而促进专家行为的专业化。为了充分释放扩散过程的潜力,DiffMoE整合了一个容量预测器,该预测器根据噪声水平和样本复杂度动态分配计算资源。通过全面评估,DiffMoE在ImageNet基准测试中实现了扩散模型的最先进性能,显著超越了激活参数数量为其三倍的密集架构以及现有的混合专家(MoE)方法,同时仅保持一倍的激活参数。我们方法的有效性不仅限于类别条件生成,还延伸至更具挑战性的任务,如文本到图像生成,展示了其在不同扩散模型应用中的广泛适用性。项目页面:https://shiml20.github.io/DiffMoE/
English
Diffusion models have demonstrated remarkable success in various image
generation tasks, but their performance is often limited by the uniform
processing of inputs across varying conditions and noise levels. To address
this limitation, we propose a novel approach that leverages the inherent
heterogeneity of the diffusion process. Our method, DiffMoE, introduces a
batch-level global token pool that enables experts to access global token
distributions during training, promoting specialized expert behavior. To
unleash the full potential of the diffusion process, DiffMoE incorporates a
capacity predictor that dynamically allocates computational resources based on
noise levels and sample complexity. Through comprehensive evaluation, DiffMoE
achieves state-of-the-art performance among diffusion models on ImageNet
benchmark, substantially outperforming both dense architectures with 3x
activated parameters and existing MoE approaches while maintaining 1x activated
parameters. The effectiveness of our approach extends beyond class-conditional
generation to more challenging tasks such as text-to-image generation,
demonstrating its broad applicability across different diffusion model
applications. Project Page: https://shiml20.github.io/DiffMoE/Summary
AI-Generated Summary