稀疏化狀態空間模型是高效的高速網路架構
Sparsified State-Space Models are Efficient Highway Networks
May 27, 2025
作者: Woomin Song, Jihoon Tack, Sangwoo Mo, Seunghyuk Oh, Jinwoo Shin
cs.AI
摘要
狀態空間模型(SSMs)為序列建模提供了一種極具前景的架構,通過以線性遞歸取代昂貴的自注意力機制,為Transformer提供了一種替代方案。本文提出了一種簡單而有效的技巧,通過稀疏化來在給定的計算預算內增強SSMs。我們的直覺是,由於漸進的遞歸更新,SSMs中的令牌存在高度冗餘,而密集的遞歸操作阻礙了過去信息的傳遞。特別地,我們觀察到SSMs的上層因編碼全局信息而趨於更為冗餘,而下層則編碼局部信息。基於此,我們引入了Simba,一種基於令牌剪枝的SSMs分層稀疏化方法。Simba對上層進行比下層更為顯著的稀疏化,促使上層表現得像高速公路一樣。為實現這一點,我們提出了一種新穎的SSMs令牌剪枝標準,通過累積局部遞歸來衡量令牌對最終輸出的全局影響。我們證明,在各種自然語言任務中,Simba在相同浮點運算次數(FLOPS)下優於基準模型Mamba。此外,我們展示了高速公路效應,表明Simba不僅提升了效率,還改善了長序列間的信息流動。代碼已發佈於https://github.com/woominsong/Simba。
English
State-space models (SSMs) offer a promising architecture for sequence
modeling, providing an alternative to Transformers by replacing expensive
self-attention with linear recurrences. In this paper, we propose a simple yet
effective trick to enhance SSMs within given computational budgets by
sparsifying them. Our intuition is that tokens in SSMs are highly redundant due
to gradual recurrent updates, and dense recurrence operations block the
delivery of past information. In particular, we observe that upper layers of
SSMs tend to be more redundant as they encode global information, while lower
layers encode local information. Motivated by this, we introduce Simba, a
hierarchical sparsification method for SSMs based on token pruning. Simba
sparsifies upper layers more than lower layers, encouraging the upper layers to
behave like highways. To achieve this, we propose a novel token pruning
criterion for SSMs, measuring the global impact of tokens on the final output
by accumulating local recurrences. We demonstrate that Simba outperforms the
baseline model, Mamba, with the same FLOPS in various natural language tasks.
Moreover, we illustrate the effect of highways, showing that Simba not only
enhances efficiency but also improves the information flow across long
sequences. Code is available at https://github.com/woominsong/Simba.