ChatPaper.aiChatPaper

ZigMa:Zigzag Mamba擴散模型

ZigMa: Zigzag Mamba Diffusion Model

March 20, 2024
作者: Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer
cs.AI

摘要

擴散模型長期以來一直受到可擴展性和二次複雜性問題的困擾,特別是在基於Transformer的結構中。在這項研究中,我們旨在利用名為Mamba的狀態空間模型的長序列建模能力,將其應用擴展到視覺數據生成。首先,我們確定了大多數當前基於Mamba的視覺方法存在的一個關鍵疏忽,即在Mamba的掃描方案中缺乏對空間連續性的考慮。其次,基於這一洞察,我們引入了一種名為Zigzag Mamba的簡單、即插即用、零參數方法,優於基於Mamba的基準線,並且與基於Transformer的基準線相比,表現出更好的速度和記憶體利用率。最後,我們將Zigzag Mamba與隨機插值框架相結合,以研究模型在大分辨率視覺數據集(例如FacesHQ 1024x1024和UCF101、MultiModal-CelebA-HQ以及MS COCO 256x256)上的可擴展性。代碼將在https://taohu.me/zigma/上發布。
English
The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ 1024times 1024 and UCF101, MultiModal-CelebA-HQ, and MS COCO 256times 256. Code will be released at https://taohu.me/zigma/

Summary

AI-Generated Summary

PDF182December 15, 2024