ChatPaper.aiChatPaper

RelaCtrl:基於相關性引導的擴散變換器高效控制

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

February 20, 2025
作者: Ke Cao, Jing Wang, Ao Ma, Jiasong Feng, Zhanjie Zhang, Xuanhua He, Shanyuan Liu, Bo Cheng, Dawei Leng, Yuhui Yin, Jie Zhang
cs.AI

摘要

擴散變換器(Diffusion Transformer)在推動文本到圖像及文本到視頻生成技術的進步中扮演著關鍵角色,這主要歸功於其內在的可擴展性。然而,現有的可控擴散變換器方法因未能考慮控制信息在不同變換器層次中的相關性差異,導致了顯著的參數和計算開銷,並存在資源分配效率低下的問題。為此,我們提出了基於相關性指導的高效可控生成框架——RelaCtrl,該框架能夠高效且資源優化地將控制信號整合到擴散變換器中。首先,我們通過評估“ControlNet相關性評分”——即在推理過程中跳過每個控制層對生成質量和控制效果的影響——來衡量擴散變換器中每一層與控制信息的相關性。根據相關性的強弱,我們隨後定制控制層的位置、參數規模和建模能力,以減少不必要的參數和冗餘計算。此外,為了進一步提升效率,我們用精心設計的二維混洗混合器(TDSM)替換了常用複製塊中的自注意力機制和前饋網絡,從而實現了令牌混合器和通道混合器的高效實現。定性和定量實驗結果均表明,與PixArt-delta相比,我們的方法僅需15%的參數和計算複雜度即可達到更優的性能。更多示例請訪問https://relactrl.github.io/RelaCtrl/。
English
The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the "ControlNet Relevance Score"-i.e., the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and FFN in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta. More examples are available at https://relactrl.github.io/RelaCtrl/.

Summary

AI-Generated Summary

PDF122February 21, 2025