DiffMoE：面向可扩展扩散变换器的动态令牌选择机制

摘要

扩散模型在多种图像生成任务中展现了卓越的成就，但其性能往往受限于对不同条件和噪声水平下输入的统一处理。为解决这一局限，我们提出了一种新颖方法，该方法充分利用了扩散过程固有的异质性。我们的方法——DiffMoE，引入了一个批处理级别的全局令牌池，使得专家在训练期间能够访问全局令牌分布，从而促进专家行为的专业化。为了充分释放扩散过程的潜力，DiffMoE整合了一个容量预测器，该预测器根据噪声水平和样本复杂度动态分配计算资源。通过全面评估，DiffMoE在ImageNet基准测试中实现了扩散模型的最先进性能，显著超越了激活参数数量为其三倍的密集架构以及现有的混合专家（MoE）方法，同时仅保持一倍的激活参数。我们方法的有效性不仅限于类别条件生成，还延伸至更具挑战性的任务，如文本到图像生成，展示了其在不同扩散模型应用中的广泛适用性。项目页面：https://shiml20.github.io/DiffMoE/

English

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we propose a novel approach that leverages the inherent heterogeneity of the diffusion process. Our method, DiffMoE, introduces a batch-level global token pool that enables experts to access global token distributions during training, promoting specialized expert behavior. To unleash the full potential of the diffusion process, DiffMoE incorporates a capacity predictor that dynamically allocates computational resources based on noise levels and sample complexity. Through comprehensive evaluation, DiffMoE achieves state-of-the-art performance among diffusion models on ImageNet benchmark, substantially outperforming both dense architectures with 3x activated parameters and existing MoE approaches while maintaining 1x activated parameters. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation, demonstrating its broad applicability across different diffusion model applications. Project Page: https://shiml20.github.io/DiffMoE/

DiffMoE：面向可扩展扩散变换器的动态令牌选择机制

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

摘要

Support