SLAB：具有简化线性注意力和渐进重参数化批标准化的高效Transformer

摘要

Transformer已成为自然语言处理和计算机视觉任务的基础架构。然而，高昂的计算成本使其在资源受限设备上部署变得非常具有挑战性。本文研究了高效Transformer的计算瓶颈模块，即归一化层和注意力模块。LayerNorm通常用于Transformer架构，但由于推断期间的统计计算，不够计算友好。然而，在Transformer中用更高效的BatchNorm替换LayerNorm通常会导致性能较差和训练崩溃。为解决这一问题，我们提出了一种名为PRepBN的新方法，在训练中逐步用重新参数化的BatchNorm替换LayerNorm。此外，我们提出了一种简化的线性注意力（SLA）模块，简单而有效地实现了强大性能。对图像分类和目标检测的大量实验表明了我们提出方法的有效性。例如，我们的SLAB-Swin在ImageNet-1K上获得了83.6%的top-1准确率，延迟为16.2ms，比Flatten-Swin低2.4ms，准确率高0.1%。我们还对我们的方法进行了语言建模任务的评估，获得了可比较的性能和更低的延迟。代码可在以下网址公开获取：https://github.com/xinghaochen/SLAB 和 https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB。

English

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet-1K with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1% higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

SLAB：具有简化线性注意力和渐进重参数化批标准化的高效Transformer

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

摘要

Summary

Support

Support