SLAB: 단순화된 선형 어텐션과 점진적 재파라미터화 배치 정규화를 통한 효율적인 트랜스포머

초록

트랜스포머(Transformers)는 자연어 처리와 컴퓨터 비전 작업 모두에서 기반이 되는 아키텍처로 자리 잡았습니다. 그러나 높은 계산 비용으로 인해 자원이 제한된 장치에 배포하기에는 상당히 어려움이 있습니다. 본 논문은 효율적인 트랜스포머의 계산 병목 현상을 일으키는 모듈, 즉 정규화 계층과 어텐션 모듈을 조사합니다. LayerNorm은 트랜스포머 아키텍처에서 흔히 사용되지만, 추론 과정에서 통계 계산이 필요하기 때문에 계산적으로 불리합니다. 그러나 LayerNorm을 더 효율적인 BatchNorm으로 대체하면 종종 성능 저하와 학습 중 붕괴가 발생합니다. 이 문제를 해결하기 위해, 우리는 학습 과정에서 LayerNorm을 재파라미터화된 BatchNorm으로 점진적으로 대체하는 PRepBN이라는 새로운 방법을 제안합니다. 또한, 간단하면서도 강력한 성능을 달성할 수 있는 단순화된 선형 어텐션(SLA) 모듈을 제안합니다. 이미지 분류 및 객체 탐지에 대한 광범위한 실험을 통해 우리가 제안한 방법의 효과를 입증했습니다. 예를 들어, 우리의 SLAB-Swin은 ImageNet-1K에서 83.6%의 top-1 정확도를 달성하며 16.2ms의 지연 시간을 보였는데, 이는 Flatten-Swin보다 2.4ms 더 빠르고 정확도는 0.1% 더 높습니다. 또한, 언어 모델링 작업에 대해 우리의 방법을 평가했을 때 비슷한 성능과 더 낮은 지연 시간을 얻었습니다. 코드는 https://github.com/xinghaochen/SLAB와 https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB에서 공개되어 있습니다.

English

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet-1K with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1% higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency.Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

SLAB: 단순화된 선형 어텐션과 점진적 재파라미터화 배치 정규화를 통한 효율적인 트랜스포머

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

초록

Summary

Support

Support