I&S-ViT: 사후 학습 ViT 양자화의 한계를 넘기 위한 포괄적이고 안정적인 방법

초록

비전 트랜스포머(ViTs)의 확장 가능한 성능에도 불구하고, 높은 계산 비용(학습 및 추론)으로 인해 산업적 응용에서의 입지가 약화되고 있습니다. 사후 학습 양자화(PTQ)는 소량의 데이터셋으로 ViTs를 조정하고 저비트 형식으로 실행하여 비용 문제를 잘 해결하지만, 불행히도 더 낮은 비트에서는 더 큰 성능 저하를 초래합니다. 본 논문에서는 ViTs의 PTQ를 포괄적이고 안정적인 방식으로 규제하는 새로운 방법인 I&S-ViT를 소개합니다. I&S-ViT는 먼저 ViTs의 PTQ에서 두 가지 문제를 식별합니다: (1) 소프트맥스 이후 활성화에 널리 사용되는 log2 양자화기의 비효율성; (2) 레이어 정규화 이후 활성화에 대한 거친 양자화 단위에서의 거칠고 증폭된 손실 경관. 그런 다음, I&S-ViT는 이러한 문제를 해결하기 위해 다음을 도입합니다: (1) 포괄적인 도메인 표현과 정확한 분포 근사를 달성하기 위해 시프트 메커니즘과 균일 양자화를 결합한 새로운 시프트-균일-log2 양자화기(SULQ); (2) 채널별 및 레이어별 양자화의 장점을 통합하여 안정적인 학습을 가능하게 하는 3단계 부드러운 최적화 전략(SOS). 다양한 비전 작업에 걸친 포괄적인 평가는 I&S-ViT가 기존 ViTs PTQ 방법들, 특히 저비트 시나리오에서 우수함을 입증합니다. 예를 들어, I&S-ViT는 3비트 ViT-B의 성능을 인상적인 50.68% 향상시킵니다.

English

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

I&S-ViT: 사후 학습 ViT 양자화의 한계를 넘기 위한 포괄적이고 안정적인 방법

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

초록

Support