SLAB: Trasformatori Efficienti con Attenzione Lineare Semplificata e Normalizzazione Batch a Riparametrizzazione Progressiva

Abstract

I Transformer sono diventati architetture fondamentali sia per i compiti di elaborazione del linguaggio naturale che per quelli di visione artificiale. Tuttavia, l'elevato costo computazionale rende piuttosto impegnativo il loro dispiegamento su dispositivi con risorse limitate. Questo articolo indaga i moduli che costituiscono un collo di bottiglia computazionale nei transformer efficienti, ovvero i livelli di normalizzazione e i moduli di attenzione. LayerNorm è comunemente utilizzato nelle architetture transformer, ma non è computazionalmente vantaggioso a causa del calcolo delle statistiche durante l'inferenza. Tuttavia, sostituire LayerNorm con BatchNorm, più efficiente, nei transformer spesso porta a prestazioni inferiori e al collasso durante l'addestramento. Per affrontare questo problema, proponiamo un metodo innovativo denominato PRepBN per sostituire progressivamente LayerNorm con BatchNorm ri-parametrizzato durante l'addestramento. Inoltre, proponiamo un modulo di attenzione lineare semplificato (SLA) che è semplice ma efficace per ottenere prestazioni solide. Esperimenti estesi sulla classificazione delle immagini e sul rilevamento degli oggetti dimostrano l'efficacia del nostro metodo proposto. Ad esempio, il nostro SLAB-Swin ottiene un'accuratezza top-1 dell'83,6% su ImageNet-1K con una latenza di 16,2ms, che è 2,4ms in meno rispetto a Flatten-Swin con un'accuratezza superiore dello 0,1%. Abbiamo anche valutato il nostro metodo per il compito di modellazione del linguaggio, ottenendo prestazioni comparabili e una latenza inferiore. I codici sono disponibili pubblicamente su https://github.com/xinghaochen/SLAB e https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

English

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet-1K with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1% higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

SLAB: Trasformatori Efficienti con Attenzione Lineare Semplificata e Normalizzazione Batch a Riparametrizzazione Progressiva

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Abstract

Support