DiffMoE:面向可擴展擴散變換器的動態令牌選擇
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
March 18, 2025
作者: Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie Zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun Gai
cs.AI
摘要
擴散模型在各種圖像生成任務中展現了卓越的成就,但其性能往往受限於對不同條件和噪聲水平下輸入的統一處理。為解決這一限制,我們提出了一種新穎的方法,該方法利用了擴散過程固有的異質性。我們的方法,DiffMoE,引入了一個批次級別的全局令牌池,使專家在訓練期間能夠訪問全局令牌分佈,從而促進專家的專業化行為。為了充分發揮擴散過程的潛力,DiffMoE整合了一個容量預測器,該預測器根據噪聲水平和樣本複雜度動態分配計算資源。通過全面評估,DiffMoE在ImageNet基準測試中實現了擴散模型中的最先進性能,顯著超越了具有3倍激活參數的密集架構以及現有的MoE方法,同時保持1倍的激活參數。我們方法的有效性不僅限於類條件生成,還延伸至更具挑戰性的任務,如文本到圖像生成,展示了其在不同擴散模型應用中的廣泛適用性。項目頁面:https://shiml20.github.io/DiffMoE/
English
Diffusion models have demonstrated remarkable success in various image
generation tasks, but their performance is often limited by the uniform
processing of inputs across varying conditions and noise levels. To address
this limitation, we propose a novel approach that leverages the inherent
heterogeneity of the diffusion process. Our method, DiffMoE, introduces a
batch-level global token pool that enables experts to access global token
distributions during training, promoting specialized expert behavior. To
unleash the full potential of the diffusion process, DiffMoE incorporates a
capacity predictor that dynamically allocates computational resources based on
noise levels and sample complexity. Through comprehensive evaluation, DiffMoE
achieves state-of-the-art performance among diffusion models on ImageNet
benchmark, substantially outperforming both dense architectures with 3x
activated parameters and existing MoE approaches while maintaining 1x activated
parameters. The effectiveness of our approach extends beyond class-conditional
generation to more challenging tasks such as text-to-image generation,
demonstrating its broad applicability across different diffusion model
applications. Project Page: https://shiml20.github.io/DiffMoE/Summary
AI-Generated Summary