SLAB：具有簡化線性注意力和漸進重新參數化批次標準化的高效Transformer

摘要

Transformer已成為自然語言和計算機視覺任務的基礎架構。然而，高計算成本使其在資源受限的設備上部署變得相當具挑戰性。本文研究了高效Transformer的計算瓶頸模塊，即標準化層和注意力模塊。LayerNorm通常用於Transformer架構，但由於推論期間的統計計算，並不計算友好。然而，在Transformer中用更高效的BatchNorm替換LayerNorm通常會導致性能下降和訓練崩潰。為了解決這個問題，我們提出了一種名為PRepBN的新方法，在訓練過程中逐步用重新參數化的BatchNorm替換LayerNorm。此外，我們提出了一個簡化的線性注意力（SLA）模塊，簡單而有效地實現強大性能。對圖像分類和物體檢測的大量實驗證明了我們提出方法的有效性。例如，我們的SLAB-Swin在ImageNet-1K上獲得了83.6%的top-1準確率，延遲為16.2ms，比Flatten-Swin低2.4ms，準確率高0.1%。我們還對我們的方法進行了語言建模任務的評估，獲得了可比的性能和更低的延遲。代碼可在以下鏈接公開獲取：https://github.com/xinghaochen/SLAB 和 https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB。

English

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet-1K with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1% higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

SLAB：具有簡化線性注意力和漸進重新參數化批次標準化的高效Transformer

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

摘要

Support