ChatPaper.aiChatPaper

稀疏化状态空间模型是高效的高速网络架构。

Sparsified State-Space Models are Efficient Highway Networks

May 27, 2025
作者: Woomin Song, Jihoon Tack, Sangwoo Mo, Seunghyuk Oh, Jinwoo Shin
cs.AI

摘要

状态空间模型(SSMs)为序列建模提供了一种极具潜力的架构,通过用线性递归替代昂贵的自注意力机制,为Transformer提供了一种替代方案。本文提出了一种简单而有效的技巧,在给定计算预算内通过稀疏化来增强SSMs。我们的直觉是,由于逐步的递归更新,SSMs中的token具有高度冗余性,而密集的递归操作阻碍了历史信息的传递。特别是,我们观察到SSMs的上层由于编码全局信息而往往更为冗余,而下层则编码局部信息。基于此,我们引入了Simba,一种基于token剪枝的SSMs层次化稀疏化方法。Simba对上层进行比下层更大幅度的稀疏化,促使上层表现得像高速公路一样。为此,我们提出了一种新颖的SSMs token剪枝准则,通过累积局部递归来衡量token对最终输出的全局影响。我们证明,在各种自然语言任务中,Simba在相同FLOPS下优于基线模型Mamba。此外,我们展示了高速公路的效果,表明Simba不仅提升了效率,还改善了长序列中的信息流动。代码可在https://github.com/woominsong/Simba获取。
English
State-space models (SSMs) offer a promising architecture for sequence modeling, providing an alternative to Transformers by replacing expensive self-attention with linear recurrences. In this paper, we propose a simple yet effective trick to enhance SSMs within given computational budgets by sparsifying them. Our intuition is that tokens in SSMs are highly redundant due to gradual recurrent updates, and dense recurrence operations block the delivery of past information. In particular, we observe that upper layers of SSMs tend to be more redundant as they encode global information, while lower layers encode local information. Motivated by this, we introduce Simba, a hierarchical sparsification method for SSMs based on token pruning. Simba sparsifies upper layers more than lower layers, encouraging the upper layers to behave like highways. To achieve this, we propose a novel token pruning criterion for SSMs, measuring the global impact of tokens on the final output by accumulating local recurrences. We demonstrate that Simba outperforms the baseline model, Mamba, with the same FLOPS in various natural language tasks. Moreover, we illustrate the effect of highways, showing that Simba not only enhances efficiency but also improves the information flow across long sequences. Code is available at https://github.com/woominsong/Simba.

Summary

AI-Generated Summary

PDF12June 9, 2025