ChatPaper.aiChatPaper

Swin-Free:通过变化尺寸的窗口实现更好的跨窗口注意力和效率

Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window

June 23, 2023
作者: Jinkyu Koo, John Yang, Le An, Gwenaelle Cunha Sergio, Su Inn Park
cs.AI

摘要

Transformer模型在计算机视觉领域展现出巨大潜力,继在语言任务中取得成功后。Swin Transformer是其中之一,在准确性方面优于基于卷积的架构,同时在效率上优于Vision Transformer(ViT)及其变体,后者相对于输入大小具有二次复杂度。Swin Transformer具有移动窗口,允许跨窗口连接,同时将自注意力计算限制在非重叠的局部窗口内。然而,移动窗口引入了内存复制操作,这占据了其运行时间的相当大部分。为了缓解这一问题,我们提出了Swin-Free,其中我们在各阶段应用尺寸变化的窗口,而不是移动窗口,以实现局部窗口之间的交叉连接。通过这种简单的设计更改,Swin-Free在推断时比Swin Transformer运行更快,并具有更高的准确性。此外,我们还提出了几种Swin-Free变体,这些变体比其Swin Transformer对应物更快。
English
Transformer models have shown great potential in computer vision, following their success in language tasks. Swin Transformer is one of them that outperforms convolution-based architectures in terms of accuracy, while improving efficiency when compared to Vision Transformer (ViT) and its variants, which have quadratic complexity with respect to the input size. Swin Transformer features shifting windows that allows cross-window connection while limiting self-attention computation to non-overlapping local windows. However, shifting windows introduces memory copy operations, which account for a significant portion of its runtime. To mitigate this issue, we propose Swin-Free in which we apply size-varying windows across stages, instead of shifting windows, to achieve cross-connection among local windows. With this simple design change, Swin-Free runs faster than the Swin Transformer at inference with better accuracy. Furthermore, we also propose a few of Swin-Free variants that are faster than their Swin Transformer counterparts.
PDF50December 15, 2024