VSSD:具有非正式狀態空間對偶的視覺曼巴
VSSD: Vision Mamba with Non-Casual State Space Duality
July 26, 2024
作者: Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu
cs.AI
摘要
視覺轉換器已顯著推動了計算機視覺領域,提供了強大的建模能力和全局感受野。然而,它們高度的計算需求限制了它們在處理長序列方面的應用。為了應對這個問題,狀態空間模型(SSMs)在視覺任務中變得越來越受重視,因為它們提供了線性計算複雜度。最近,在Mamba2中引入了狀態空間對偶(SSD),這是SSMs的一種改進變體,旨在增強模型性能和效率。然而,SSD/SSMs固有的因果性質限制了它們在非因果性視覺任務中的應用。為了解決這一限制,我們引入了視覺狀態空間對偶(VSSD)模型,它具有SSD的非因果格式。具體來說,我們建議捨棄隱藏狀態和標記之間的交互作用強度,同時保留它們的相對權重,從而減輕了標記對先前標記的依賴。結合多掃描策略,我們展示了掃描結果可以被整合以實現非因果性,這不僅提高了SSD在視覺任務中的性能,還增強了其效率。我們在包括圖像分類、檢測和分割在內的各種基準測試上進行了大量實驗,其中VSSD超越了現有的基於SSM的最新模型。代碼和權重可在https://github.com/YuHengsss/VSSD 上獲得。
English
Vision transformers have significantly advanced the field of computer vision,
offering robust modeling capabilities and global receptive field. However,
their high computational demands limit their applicability in processing long
sequences. To tackle this issue, State Space Models (SSMs) have gained
prominence in vision tasks as they offer linear computational complexity.
Recently, State Space Duality (SSD), an improved variant of SSMs, was
introduced in Mamba2 to enhance model performance and efficiency. However, the
inherent causal nature of SSD/SSMs restricts their applications in non-causal
vision tasks. To address this limitation, we introduce Visual State Space
Duality (VSSD) model, which has a non-causal format of SSD. Specifically, we
propose to discard the magnitude of interactions between the hidden state and
tokens while preserving their relative weights, which relieves the dependencies
of token contribution on previous tokens. Together with the involvement of
multi-scan strategies, we show that the scanning results can be integrated to
achieve non-causality, which not only improves the performance of SSD in vision
tasks but also enhances its efficiency. We conduct extensive experiments on
various benchmarks including image classification, detection, and segmentation,
where VSSD surpasses existing state-of-the-art SSM-based models. Code and
weights are available at https://github.com/YuHengsss/VSSD.Summary
AI-Generated Summary