SLAB: 簡素化された線形アテンションと漸進的再パラメータ化バッチ正規化を備えた効率的なトランスフォーマー

要旨

Transformerは、自然言語処理とコンピュータビジョンのタスクにおいて基盤となるアーキテクチャとして確立されています。しかし、その高い計算コストのため、リソースが限られたデバイスへの展開は非常に困難です。本論文では、効率的なTransformerの計算ボトルネックとなるモジュール、すなわち正規化層とアテンションモジュールについて調査します。LayerNormはTransformerアーキテクチャで一般的に使用されていますが、推論時の統計計算のため計算効率が良くありません。しかし、LayerNormをより効率的なBatchNormに置き換えると、性能が低下し、トレーニングが崩壊することがよくあります。この問題を解決するため、我々はトレーニング中にLayerNormを再パラメータ化されたBatchNormに段階的に置き換える新しい手法「PRepBN」を提案します。さらに、シンプルでありながら強力な性能を発揮する簡素化された線形アテンション（SLA）モジュールを提案します。画像分類および物体検出における広範な実験により、提案手法の有効性が実証されています。例えば、我々のSLAB-SwinはImageNet-1Kにおいて83.6%のトップ1精度を達成し、16.2msのレイテンシを示しました。これはFlatten-Swinよりも2.4ms短く、精度も0.1%高くなっています。また、言語モデリングタスクにおいても提案手法を評価し、同等の性能とより低いレイテンシを達成しました。コードはhttps://github.com/xinghaochen/SLABおよびhttps://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLABで公開されています。

English

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet-1K with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1% higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

SLAB: 簡素化された線形アテンションと漸進的再パラメータ化バッチ正規化を備えた効率的なトランスフォーマー

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

要旨

Support